From: Al Viro <viro@ZenIV.linux.org.uk>
To: Andy Lutomirski <luto@amacapital.net>
Cc: David Drysdale <drysdale@google.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Meredydd Luff <meredydd@senatehouse.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
Andrew Morton <akpm@linux-foundation.org>,
Kees Cook <keescook@chromium.org>, Arnd Bergmann <arnd@arndb.de>,
X86 ML <x86@kernel.org>, linux-arch <linux-arch@vger.kernel.org>,
Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCHv4 RESEND 0/3] syscalls,x86: Add execveat() system call
Date: Sun, 19 Oct 2014 21:20:34 +0100 [thread overview]
Message-ID: <20141019202034.GH7996@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CALCETrVraoD+r4zxBoGd+BV5P275AXcRV_R00SSr8fjQzRHnUg@mail.gmail.com>
On Fri, Oct 17, 2014 at 02:45:03PM -0700, Andy Lutomirski wrote:
> For example, I want to be able to reliably do something like nsenter
> --namespace-flags-here toybox sh. Toybox's shell is unusual in that
> it is more or less fully functional, so this should Just Work (tm),
> except that the toybox binary might not exist in the namespace being
> entered. If execveat were available, I could rig nsenter or a similar
> tool to open it with O_CLOEXEC, enter the namespace, and then call
> execveat.
The question I hadn't seen really answered through all of that was how to
deal with #!... "Just use d_path()" isn't particulary appealing - if that
file has a pathname reachable for you, you could bloody well use _that_
from the very beginning.
Frankly, I wonder if it would make sense to provide something like
dupfs. We can't mount it by default on /dev/fd (more's the pity), but
it might be a good thing to have.
What it is, for those who are not familiar with Plan 9: a filesystem with
one directory and a bunch of files in it. Directory contents depends on
who's looking; for each opened descriptor in your descriptor table, you'll
see two files there. One series is 0, 1, ... - opening one of those gives
dup(). IOW, it's *not* giving you a new struct file; it gives you a new
reference to existing one, complete with sharing IO position, etc. Another
is 0ctl, 1ctl, ... - those are read-only and reading from them gives pretty
much a combination of our /proc/self/fdinfo/n with readlink of /proc/self/fd/n.
It's actually a better match for what one would expect at /dev/fd than what
we do. Example:
; echo 'read i; cat /dev/fd/0; echo "The first line was $i"' >a.sh
; (echo 'line 1';echo 'line 2') >a
; cat a|sh a.sh
line 2
The first line was line 1
; sh a.sh <a
line 1
line 2
The first line was line 1
;
See what's going on? Opening /dev/fd/0 (aka /dev/stdin) does a fresh open
of whatever your stdin is; if it's a pipe - fine, you've just added yourself
as additional reader. But if it's a regular file, you've got yourself
a brand-new opened file, with IO position of its own. Sitting at the
beginning of the file.
Moreover, try that with stdin being a socket and you'll see cat(1) failing
to open that sucker.
We _can't_ blindly replace /dev/fd with it - it has to be a sysadmin choice;
semantics is different. However, there's no reason why it can't be mounted
in environments where you want to avoid procfs - it's certainly exposing less
than procfs would.
And these days we can implement relatively cheaply. It's a window that will
close after a while, but right now we can change ->atomic_open() calling
conventions. Instead of having it return 0 or error, let's switch to returning
NULL, ERR_PTR(error) *or* an extra reference to preexisting struct file.
Same as we did for ->lookup(), and for similar reason.
Right now we have 8 instances of ->atomic_open() and one place calling that
method. Changing the API like that would be trivial (and it's a trivial
conversion - replace return ret; with return ERR_PTR(ret); through all
instances, so any out-of-tree filesystems could follow easily). We certainly
can't do anything of that sort with ->open() - there would be thousands
instances to convert. ->atomic_open(), OTOH, is still new enough for
that to be feasible.
What we get from that conversion is an ability to do dup-style semantics
easily.
* give root directory an ->atomic_open() instance that would be
handling opens.
* make lookups in there fail with ENOENT if you don't have such a
descriptor at the moment. Otherwise bind all of them to the same inode.
The only method it needs is ->getattr(), and that would look into your
descriptor table for descriptor with number derived from dentry (stashed
in ->d_fsdata at lookup time) and do what fstat() would.
* have those dentries always fail ->d_revalidate(), to force
everything towards ->atomic_open().
* for ...ctl names, ->atomic_open() would act in normal fashion;
again, only one inode is needed. ->read() would pick descriptor number
from ->d_fsdata and report on whatever you have with that number at the
time.
I'll try to put a prototype of that together; I think it's at least
interesting to try. And that ought to be safe to mount even in very
restricted environments, making arguments along the lines of "but we can't
get the path by opened file without the big bad wol^Wprocfs and we can't
have that in our environment" much weaker...
Comments?
next prev parent reply other threads:[~2014-10-19 20:20 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-05 13:40 [PATCHv4 RESEND 0/3] syscalls,x86: Add execveat() system call David Drysdale
2014-06-05 13:40 ` [PATCHv4 RESEND 1/3] syscalls,x86: implement " David Drysdale
[not found] ` <1401975635-6162-2-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2014-06-23 18:39 ` Kees Cook
2014-06-05 13:40 ` [PATCHv4 RESEND 2/3] syscalls,x86: add selftest for execveat(2) David Drysdale
2014-06-05 13:40 ` [PATCHv4 RESEND man-pages 3/3] execveat.2: initial man page " David Drysdale
2014-06-05 17:14 ` [PATCHv4 RESEND 0/3] syscalls,x86: Add execveat() system call Kees Cook
[not found] ` <1401975635-6162-1-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2014-10-17 21:45 ` Andy Lutomirski
2014-10-19 0:20 ` Eric W. Biederman
2014-10-19 19:11 ` Andy Lutomirski
[not found] ` <87zjcszz8y.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-20 13:48 ` David Drysdale
[not found] ` <CAHse=S-Xyk7CFn=tAGzo+tuUFt+04aBw+mGQmi=kWAaBJGALBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-20 22:48 ` Andy Lutomirski
[not found] ` <CALCETrXBjLZTWVJfcsE4NA-JP9zSSgn=6ND0=cZ9BTy=CoN7pA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-21 4:29 ` Eric W. Biederman
[not found] ` <87ioje2ggq.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-22 11:08 ` David Drysdale
2014-10-22 17:40 ` Eric W. Biederman
2014-10-27 18:01 ` David Drysdale
2014-10-19 20:20 ` Al Viro [this message]
[not found] ` <20141019202034.GH7996-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2014-10-19 20:37 ` Andy Lutomirski
[not found] ` <CALCETrVZUW2iPtfFJtGnWd2RsYLwjGRGYuujrVqcOsO5oBB8Cg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-19 21:29 ` Al Viro
[not found] ` <20141019212921.GI7996-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2014-10-19 22:16 ` Andy Lutomirski
2014-10-19 22:42 ` Al Viro
2014-10-19 23:35 ` Andy Lutomirski
2014-10-25 21:22 ` Pavel Machek
2014-10-19 20:53 ` H. Peter Anvin
2014-10-22 11:54 ` Christoph Hellwig
2014-10-22 11:54 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141019202034.GH7996@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=drysdale@google.com \
--cc=ebiederm@xmission.com \
--cc=hpa@zytor.com \
--cc=keescook@chromium.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=meredydd@senatehouse.org \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).