* [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
@ 2014-10-30 9:08 Eric Rannaud
2014-10-30 21:48 ` Andy Lutomirski
0 siblings, 1 reply; 23+ messages in thread
From: Eric Rannaud @ 2014-10-30 9:08 UTC (permalink / raw)
To: linux-kernel
Cc: Alexander Viro, linux-fsdevel, Andrew Morton, Linus Torvalds,
Eric Rannaud
The man page for open(2) indicates that when O_CREAT is specified, the
'mode' argument applies only to future accesses to the file:
Note that this mode applies only to future accesses of the newly
created file; the open() call that creates a read-only file
may well return a read/write file descriptor.
The man page for open(2) implies that 'mode' is treated identically by
O_CREAT and O_TMPFILE.
O_TMPFILE, however, behaves differently:
int fd = open("/tmp", O_TMPFILE | O_RDWR, 0);
assert(fd == -1);
assert(errno == EACCES);
int fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
assert(fd > 0);
For O_CREAT, do_last() sets acc_mode to MAY_OPEN only:
if (*opened & FILE_CREATED) {
/* Don't check for write permission, don't truncate */
open_flag &= ~O_TRUNC;
will_truncate = false;
acc_mode = MAY_OPEN;
path_to_nameidata(path, nd);
goto finish_open_created;
}
But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to
may_open().
This patch lines up the behavior of O_TMPFILE with O_CREAT. After the
inode is created, may_open() is called with acc_mode = MAY_OPEN, in
do_tmpfile().
A different, but related glibc bug revealed the discrepancy:
https://sourceware.org/bugzilla/show_bug.cgi?id=17523
The glibc lazily loads the 'mode' argument of open() and openat() using
va_arg() only if O_CREAT is present in 'flags' (to support both the 2
argument and the 3 argument forms of open; same idea for openat()).
However, the glibc ignores the 'mode' argument if O_TMPFILE is in
'flags'.
On x86_64, for open(), it magically works anyway, as 'mode' is in
RDX when entering open(), and is still in RDX on SYSCALL, which is where
the kernel looks for the 3rd argument of a syscall.
But openat() is not quite so lucky: 'mode' is in RCX when entering the
glibc wrapper for openat(), while the kernel looks for the 4th argument
of a syscall in R10. Indeed, the syscall calling convention differs from
the regular calling convention in this respect on x86_64. So the kernel
sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and
fails with EACCES.
Signed-off-by: Eric Rannaud <e@nanocritical.com>
---
fs/namei.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/namei.c b/fs/namei.c
index 42df664e95e5..78512898d3ba 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3154,7 +3154,8 @@ static int do_tmpfile(int dfd, struct filename *pathname,
if (error)
goto out2;
audit_inode(pathname, nd->path.dentry, 0);
- error = may_open(&nd->path, op->acc_mode, op->open_flag);
+ /* Don't check for other permissions, the inode was just created */
+ error = may_open(&nd->path, MAY_OPEN, op->open_flag);
if (error)
goto out2;
file->f_path.mnt = nd->path.mnt;
--
2.1.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-10-30 9:08 [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0 Eric Rannaud
@ 2014-10-30 21:48 ` Andy Lutomirski
2014-10-30 22:48 ` Linus Torvalds
0 siblings, 1 reply; 23+ messages in thread
From: Andy Lutomirski @ 2014-10-30 21:48 UTC (permalink / raw)
To: Eric Rannaud, linux-kernel
Cc: Alexander Viro, linux-fsdevel, Andrew Morton, Linus Torvalds
On 10/30/2014 02:08 AM, Eric Rannaud wrote:
> The man page for open(2) indicates that when O_CREAT is specified, the
> 'mode' argument applies only to future accesses to the file:
>
> Note that this mode applies only to future accesses of the newly
> created file; the open() call that creates a read-only file
> may well return a read/write file descriptor.
>
> The man page for open(2) implies that 'mode' is treated identically by
> O_CREAT and O_TMPFILE.
>
> O_TMPFILE, however, behaves differently:
>
> int fd = open("/tmp", O_TMPFILE | O_RDWR, 0);
> assert(fd == -1);
> assert(errno == EACCES);
>
> int fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
> assert(fd > 0);
>
> For O_CREAT, do_last() sets acc_mode to MAY_OPEN only:
>
> if (*opened & FILE_CREATED) {
> /* Don't check for write permission, don't truncate */
> open_flag &= ~O_TRUNC;
> will_truncate = false;
> acc_mode = MAY_OPEN;
> path_to_nameidata(path, nd);
> goto finish_open_created;
> }
>
> But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to
> may_open().
>
> This patch lines up the behavior of O_TMPFILE with O_CREAT. After the
> inode is created, may_open() is called with acc_mode = MAY_OPEN, in
> do_tmpfile().
>
> A different, but related glibc bug revealed the discrepancy:
> https://sourceware.org/bugzilla/show_bug.cgi?id=17523
>
> The glibc lazily loads the 'mode' argument of open() and openat() using
> va_arg() only if O_CREAT is present in 'flags' (to support both the 2
> argument and the 3 argument forms of open; same idea for openat()).
> However, the glibc ignores the 'mode' argument if O_TMPFILE is in
> 'flags'.
>
> On x86_64, for open(), it magically works anyway, as 'mode' is in
> RDX when entering open(), and is still in RDX on SYSCALL, which is where
> the kernel looks for the 3rd argument of a syscall.
>
> But openat() is not quite so lucky: 'mode' is in RCX when entering the
> glibc wrapper for openat(), while the kernel looks for the 4th argument
> of a syscall in R10. Indeed, the syscall calling convention differs from
> the regular calling convention in this respect on x86_64. So the kernel
> sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and
> fails with EACCES.
Looks sensible. Should this be Cc: stable?
--Andy
>
> Signed-off-by: Eric Rannaud <e@nanocritical.com>
> ---
> fs/namei.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 42df664e95e5..78512898d3ba 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3154,7 +3154,8 @@ static int do_tmpfile(int dfd, struct filename *pathname,
> if (error)
> goto out2;
> audit_inode(pathname, nd->path.dentry, 0);
> - error = may_open(&nd->path, op->acc_mode, op->open_flag);
> + /* Don't check for other permissions, the inode was just created */
> + error = may_open(&nd->path, MAY_OPEN, op->open_flag);
> if (error)
> goto out2;
> file->f_path.mnt = nd->path.mnt;
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-10-30 21:48 ` Andy Lutomirski
@ 2014-10-30 22:48 ` Linus Torvalds
2014-10-30 22:58 ` Linus Torvalds
0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2014-10-30 22:48 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Eric Rannaud, Linux Kernel Mailing List, Alexander Viro,
linux-fsdevel, Andrew Morton
On Thu, Oct 30, 2014 at 2:48 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> Looks sensible. Should this be Cc: stable?
Agreed. Will apply and add the stable cc.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-10-30 22:48 ` Linus Torvalds
@ 2014-10-30 22:58 ` Linus Torvalds
2014-10-31 0:01 ` Andy Lutomirski
2014-10-31 0:57 ` Eric Rannaud
0 siblings, 2 replies; 23+ messages in thread
From: Linus Torvalds @ 2014-10-30 22:58 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Eric Rannaud, Linux Kernel Mailing List, Alexander Viro,
linux-fsdevel, Andrew Morton
On Thu, Oct 30, 2014 at 3:48 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Agreed. Will apply and add the stable cc.
Ho humm. Thinking about this some more, I'm starting to wonder. Not
about this patch per se (open on a newly created file should indeed
succeed regardless), but about the horrible glibc behavior of screwing
up the third argument.
If you want to do O_TMPFILE + linkat() (or some eventual future
flink()), the mode really matters. So this idiotic glibc behavior of
only forwarding the third argument if O_CREAT is set seems to be a
bug.
Why the hell does glibc think it's a good idea to intersect system
call semantics? It's not a good idea - it's just stupid in the
extreme. And in this case it seems to actively breaks things.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-10-30 22:58 ` Linus Torvalds
@ 2014-10-31 0:01 ` Andy Lutomirski
2014-10-31 0:59 ` Linus Torvalds
2014-10-31 8:42 ` Christoph Hellwig
2014-10-31 0:57 ` Eric Rannaud
1 sibling, 2 replies; 23+ messages in thread
From: Andy Lutomirski @ 2014-10-31 0:01 UTC (permalink / raw)
To: Linus Torvalds
Cc: Eric Rannaud, Linux Kernel Mailing List, Alexander Viro,
linux-fsdevel, Andrew Morton
On Thu, Oct 30, 2014 at 3:58 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Oct 30, 2014 at 3:48 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> Agreed. Will apply and add the stable cc.
>
> Ho humm. Thinking about this some more, I'm starting to wonder. Not
> about this patch per se (open on a newly created file should indeed
> succeed regardless), but about the horrible glibc behavior of screwing
> up the third argument.
>
> If you want to do O_TMPFILE + linkat() (or some eventual future
> flink()), the mode really matters. So this idiotic glibc behavior of
> only forwarding the third argument if O_CREAT is set seems to be a
> bug.
We could bite the bullet and add a tmpfile syscall. /me ducks
>
> Why the hell does glibc think it's a good idea to intersect system
> call semantics? It's not a good idea - it's just stupid in the
> extreme. And in this case it seems to actively breaks things.
Uh, because it's glibc? Or because it's trying not to screw up and on
some system where overrunning va_arg is terrible?
>
> Linus
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-10-31 0:01 ` Andy Lutomirski
@ 2014-10-31 0:59 ` Linus Torvalds
2014-10-31 8:42 ` Christoph Hellwig
1 sibling, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2014-10-31 0:59 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Eric Rannaud, Linux Kernel Mailing List, Alexander Viro,
linux-fsdevel, Andrew Morton
On Thu, Oct 30, 2014 at 5:01 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> Uh, because it's glibc?
Yeah. Bloated, over-engineered, and stupid.
> Or because it's trying not to screw up and on
> some system where overrunning va_arg is terrible?
No. On 99% of architectures the third argument is in a register
anyway, and traditionally it's not even va_arg, although glibc has
made it so (traditionally it's just pre-ANSI C with three arguments
and one of them might be missing - gcc has had hacks for avoiding
warnings for traditional C things like that: look at the whole
transparent union thing for another traditional "C without prototypes"
calling convention case).
But even if you make it va_arg, I can't think of a single architecture
where that makes sense. Outside of assembly trampolines, you *always*
have enough stack space that you can just access a word under the
stack anyway.
But yes, I could imagine some well-meaning - but not overly smart -
glibc developer deciding that doing the va_arg thing conditionally
would be a "feature". Despite making the code slower, bigger, and
buggier.
I guess I'll fetch the git tree and see if they document this braindamage..
[ time passes ]
Ugh. It seems to predate even the imported history (going back all the
way to 1995 - I don't know if that was SVN or CVS and whether there is
some even older historical archives that were never imported).
Anyway, since the beginning of time, the "stub/open.c" file is a True
Work of Art (TM)(also sarcasm), and has an old-style C declaration
(not ANSI) for __libc_open(), and uses a conditional va_arg() to get
the third parameter *despite* not even being a variadic function (not
varargs, not stdarg). So it's not even portable or correct *anyway*,
and it unnecessarily generates bad code and seems to have been
mindlessly copied into all the actual real non-stub implementations.
Most of them seem to have made their definitions match the declaration
in the process, so they then really do have the variadic part. Goodie,
I guess, except for this all being unnecessary crap and stupid.
Oh well. What a cock-up.
The code is insane in other ways too. The actual real Linux version of
__libc_open() ends up (for no good reason except to compete with
cat-ladies in the "crazy person of the year" award) using
"openat(AT_FDCWD)", just so you can make sure that the result doesn't
possibly work on old versions of the kernel even by mistake. I
certainly cannot possibly see any actual *advantage* to using
"openat()", but them I'm not a homeless cat-lady. It also has some
magic "LIBC_CANCEL_ASYNC()/LIBC_CANCEL_RESET()" stuff around it, which
I'm sure is entirely sane.
I can't take it any more. That code is crazy.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-10-31 0:01 ` Andy Lutomirski
2014-10-31 0:59 ` Linus Torvalds
@ 2014-10-31 8:42 ` Christoph Hellwig
2014-10-31 18:44 ` Andy Lutomirski
1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2014-10-31 8:42 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Linus Torvalds, Eric Rannaud, Linux Kernel Mailing List,
Alexander Viro, linux-fsdevel, Andrew Morton
On Thu, Oct 30, 2014 at 05:01:30PM -0700, Andy Lutomirski wrote:
> > flink()), the mode really matters. So this idiotic glibc behavior of
> > only forwarding the third argument if O_CREAT is set seems to be a
> > bug.
>
> We could bite the bullet and add a tmpfile syscall. /me ducks
I've got another use case for that: Samba. It wants to inherit
all kinds of attributes (xattrs, modes, etc) during file creation,
and right now it's doing it in a racy way. I've come up with a patch
to use O_TEMPFILE + flink, but it turns out that Samba may as well
be asked to create read-only files, which we can't create using
O_TMPFILE.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-10-31 8:42 ` Christoph Hellwig
@ 2014-10-31 18:44 ` Andy Lutomirski
2014-10-31 18:53 ` Linus Torvalds
0 siblings, 1 reply; 23+ messages in thread
From: Andy Lutomirski @ 2014-10-31 18:44 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Al Viro, linux-kernel@vger.kernel.org, Eric Rannaud,
Andrew Morton, Linus Torvalds, linux-fsdevel
On Oct 31, 2014 1:42 AM, "Christoph Hellwig" <hch@infradead.org> wrote:
>
> On Thu, Oct 30, 2014 at 05:01:30PM -0700, Andy Lutomirski wrote:
> > > flink()), the mode really matters. So this idiotic glibc behavior of
> > > only forwarding the third argument if O_CREAT is set seems to be a
> > > bug.
> >
> > We could bite the bullet and add a tmpfile syscall. /me ducks
>
> I've got another use case for that: Samba. It wants to inherit
> all kinds of attributes (xattrs, modes, etc) during file creation,
> and right now it's doing it in a racy way. I've come up with a patch
> to use O_TEMPFILE + flink, but it turns out that Samba may as well
> be asked to create read-only files, which we can't create using
> O_TMPFILE.
Does the patch in this thread not fix that?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-10-31 18:44 ` Andy Lutomirski
@ 2014-10-31 18:53 ` Linus Torvalds
2014-11-03 8:34 ` Christoph Hellwig
0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2014-10-31 18:53 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Christoph Hellwig, Al Viro, linux-kernel@vger.kernel.org,
Eric Rannaud, Andrew Morton, linux-fsdevel
On Fri, Oct 31, 2014 at 11:44 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> Does the patch in this thread not fix that?
It should. Modulo the glibc problem that makes it hard to actually set
the mode properly.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-10-31 18:53 ` Linus Torvalds
@ 2014-11-03 8:34 ` Christoph Hellwig
2014-11-03 17:05 ` Linus Torvalds
2014-11-03 17:06 ` Andy Lutomirski
0 siblings, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2014-11-03 8:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andy Lutomirski, Christoph Hellwig, Al Viro,
linux-kernel@vger.kernel.org, Eric Rannaud, Andrew Morton,
linux-fsdevel
On Fri, Oct 31, 2014 at 11:53:09AM -0700, Linus Torvalds wrote:
> On Fri, Oct 31, 2014 at 11:44 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> >
> > Does the patch in this thread not fix that?
>
> It should. Modulo the glibc problem that makes it hard to actually set
> the mode properly.
That doesn't help because we explicitly reject O_RDONLY when combined
with O_TMPFILE.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-11-03 8:34 ` Christoph Hellwig
@ 2014-11-03 17:05 ` Linus Torvalds
2014-11-03 17:06 ` Andy Lutomirski
1 sibling, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2014-11-03 17:05 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andy Lutomirski, Al Viro, linux-kernel@vger.kernel.org,
Eric Rannaud, Andrew Morton, linux-fsdevel
On Mon, Nov 3, 2014 at 12:34 AM, Christoph Hellwig <hch@infradead.org> wrote:
>
> That doesn't help because we explicitly reject O_RDONLY when combined
> with O_TMPFILE.
You obviously cannot actually have a read-only file descriptor with
O_TMPFILE, unless all you care about is a zero-sized file with no
contents.
That's why people are talking about things like downgrading the file
*after* filling it, using sealing.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-11-03 8:34 ` Christoph Hellwig
2014-11-03 17:05 ` Linus Torvalds
@ 2014-11-03 17:06 ` Andy Lutomirski
2014-11-03 18:49 ` Eric Rannaud
1 sibling, 1 reply; 23+ messages in thread
From: Andy Lutomirski @ 2014-11-03 17:06 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-kernel@vger.kernel.org, Eric Rannaud, Andrew Morton,
Al Viro, Linus Torvalds, linux-fsdevel
On Nov 3, 2014 12:34 AM, "Christoph Hellwig" <hch@infradead.org> wrote:
>
> On Fri, Oct 31, 2014 at 11:53:09AM -0700, Linus Torvalds wrote:
> > On Fri, Oct 31, 2014 at 11:44 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> > >
> > > Does the patch in this thread not fix that?
> >
> > It should. Modulo the glibc problem that makes it hard to actually set
> > the mode properly.
>
> That doesn't help because we explicitly reject O_RDONLY when combined
> with O_TMPFILE.
I think I'm missing something. How is an O_RDONLY temporary file
useful? Wouldn't you want an O_RDWR tempfile with mode 0400 or
something like that?
--Andy
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-11-03 17:06 ` Andy Lutomirski
@ 2014-11-03 18:49 ` Eric Rannaud
2014-11-03 19:04 ` Linus Torvalds
2014-11-03 22:07 ` Jeremy Allison
0 siblings, 2 replies; 23+ messages in thread
From: Eric Rannaud @ 2014-11-03 18:49 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Christoph Hellwig, linux-kernel@vger.kernel.org, Andrew Morton,
Al Viro, Linus Torvalds, linux-fsdevel
On Mon, Nov 3, 2014 at 9:06 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> That doesn't help because we explicitly reject O_RDONLY when combined
>> with O_TMPFILE.
>
> I think I'm missing something. How is an O_RDONLY temporary file
> useful? Wouldn't you want an O_RDWR tempfile with mode 0400 or
> something like that?
Isn't it because they are essentially emulating an atomic open()
capable of creating a file with inherited ACLs, according to
relatively complex rules? open *can* be used with O_CREAT|O_RDONLY
(touch(1) might do that), which would naively translate into:
fd = open(dir, O_TMPFILE|O_RDONLY, 0600)
fsetxattr(fd, "...")
fsetxattr(fd, "...")
linkat(AT_FDCWD, "/proc/self/fd/...", ..., AT_SYMLINK_FOLLOW)
return fd;
Now this would be happening on the server, and the only reason why it
would be important to ensure that fd is O_RDONLY, is that smbd does
not do its own bookkeeping of how each file handle was opened, and
would rather have the kernel enforce O_RDONLY?
With O_TMPFILE as implemented now, smbd would have to do open(dir,
O_TMPFILE|O_RDWR, 0600), but internally keep track that O_RDONLY was
requested by the client on that fd, and block any writes to fd itself.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-11-03 18:49 ` Eric Rannaud
@ 2014-11-03 19:04 ` Linus Torvalds
2014-11-05 8:27 ` Christoph Hellwig
2014-11-03 22:07 ` Jeremy Allison
1 sibling, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2014-11-03 19:04 UTC (permalink / raw)
To: Eric Rannaud
Cc: Andy Lutomirski, Christoph Hellwig, linux-kernel@vger.kernel.org,
Andrew Morton, Al Viro, linux-fsdevel
On Mon, Nov 3, 2014 at 10:49 AM, Eric Rannaud <e@nanocritical.com> wrote:
>
> Isn't it because they are essentially emulating an atomic open()
> capable of creating a file with inherited ACLs, according to
> relatively complex rules? open *can* be used with O_CREAT|O_RDONLY
> (touch(1) might do that), which would naively translate into:
Oh, so you don't actually need any file contents at all?
If that is actually a real usage, then maybe we should just say that
"O_TMPFILE|O_RDONLY" is fine, and remove the check that it has to be
writable.
That check was always a sanity-check, because people felt that a
temp-file you can't write to is an insane concept. But if there is a
real use case for it, then clearly it's not completely insane. Just
odd.
It's just that single
if (!(acc_mode & MAY_WRITE))
return -EINVAL;
test in build_open_flags(), right?
I'd take a tested patch to remove that (where "tested" means: "yes, I
actually did that unwritable file descriptor thing, and it actually
solved the problem and worked for samba or whatever")
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-11-03 19:04 ` Linus Torvalds
@ 2014-11-05 8:27 ` Christoph Hellwig
2014-11-05 15:21 ` Eric Rannaud
0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2014-11-05 8:27 UTC (permalink / raw)
To: Linus Torvalds
Cc: Eric Rannaud, Andy Lutomirski, Christoph Hellwig,
linux-kernel@vger.kernel.org, Andrew Morton, Al Viro,
linux-fsdevel
On Mon, Nov 03, 2014 at 11:04:27AM -0800, Linus Torvalds wrote:
> Oh, so you don't actually need any file contents at all?
>
> If that is actually a real usage, then maybe we should just say that
> "O_TMPFILE|O_RDONLY" is fine, and remove the check that it has to be
> writable.
Wasn't this disallowed to prevent problems on old kernels that don't use
O_TMPFILE? In that case we'd ignore the flag and would just get a file
handle for the directory instead.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-11-05 8:27 ` Christoph Hellwig
@ 2014-11-05 15:21 ` Eric Rannaud
2014-11-05 15:32 ` Andy Lutomirski
0 siblings, 1 reply; 23+ messages in thread
From: Eric Rannaud @ 2014-11-05 15:21 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Linus Torvalds, Andy Lutomirski, linux-kernel@vger.kernel.org,
Andrew Morton, Al Viro, linux-fsdevel
On Wed, Nov 5, 2014 at 12:27 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, Nov 03, 2014 at 11:04:27AM -0800, Linus Torvalds wrote:
>> Oh, so you don't actually need any file contents at all?
>>
>> If that is actually a real usage, then maybe we should just say that
>> "O_TMPFILE|O_RDONLY" is fine, and remove the check that it has to be
>> writable.
>
> Wasn't this disallowed to prevent problems on old kernels that don't use
> O_TMPFILE? In that case we'd ignore the flag and would just get a file
> handle for the directory instead.
Yes, that was the idea at first, as discussed at
http://article.gmane.org/gmane.linux.file-systems/76273, in commit
bb458c644. But soon after that, O_WRONLY was allowed (ba57ea64cb1),
and O_RDWR was removed from O_TMPFILE. Now only remain the explicit
checks in build_open_flags() that Linus mentioned.
If we allow
fd = open("/tmp", O_TMPFILE|O_RDONLY, 0600)
it would be seen by an old kernel as
fd = open("/tmp", O_DIRECTORY|O_RDONLY, 0600)
which will succeed.
But unlike the other cases the creative definition of O_TMPFILE was
meant to prevent, this does not create a security risk for anyone
implementing a secure tmpfile, as they would be asking for a writable
fd.
To implement an atomic open() with O_TMPFILE+flink, if neither
O_WRONLY nor O_RDWR is in flags, you would have to manually check with
fstat that fd is indeed a regular file and not a directory. At least
if you need to run on old kernels.
If such a changes goes in, the man page for open(2) should talk about
what happens on old kernels (it already has an explanation for the
writable case).
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-11-05 15:21 ` Eric Rannaud
@ 2014-11-05 15:32 ` Andy Lutomirski
2014-11-05 17:10 ` Linus Torvalds
0 siblings, 1 reply; 23+ messages in thread
From: Andy Lutomirski @ 2014-11-05 15:32 UTC (permalink / raw)
To: Eric Rannaud
Cc: Christoph Hellwig, Linus Torvalds, linux-kernel@vger.kernel.org,
Andrew Morton, Al Viro, linux-fsdevel
On Wed, Nov 5, 2014 at 7:21 AM, Eric Rannaud <e@nanocritical.com> wrote:
> On Wed, Nov 5, 2014 at 12:27 AM, Christoph Hellwig <hch@infradead.org> wrote:
>> On Mon, Nov 03, 2014 at 11:04:27AM -0800, Linus Torvalds wrote:
>>> Oh, so you don't actually need any file contents at all?
>>>
>>> If that is actually a real usage, then maybe we should just say that
>>> "O_TMPFILE|O_RDONLY" is fine, and remove the check that it has to be
>>> writable.
>>
>> Wasn't this disallowed to prevent problems on old kernels that don't use
>> O_TMPFILE? In that case we'd ignore the flag and would just get a file
>> handle for the directory instead.
>
> Yes, that was the idea at first, as discussed at
> http://article.gmane.org/gmane.linux.file-systems/76273, in commit
> bb458c644. But soon after that, O_WRONLY was allowed (ba57ea64cb1),
> and O_RDWR was removed from O_TMPFILE. Now only remain the explicit
> checks in build_open_flags() that Linus mentioned.
>
> If we allow
> fd = open("/tmp", O_TMPFILE|O_RDONLY, 0600)
> it would be seen by an old kernel as
> fd = open("/tmp", O_DIRECTORY|O_RDONLY, 0600)
> which will succeed.
>
> But unlike the other cases the creative definition of O_TMPFILE was
> meant to prevent, this does not create a security risk for anyone
> implementing a secure tmpfile, as they would be asking for a writable
> fd.
I have no idea whether it's a security risk, but it's a serious "wtf?!?" risk.
>
> To implement an atomic open() with O_TMPFILE+flink, if neither
> O_WRONLY nor O_RDWR is in flags, you would have to manually check with
> fstat that fd is indeed a regular file and not a directory. At least
> if you need to run on old kernels.
>
> If such a changes goes in, the man page for open(2) should talk about
> what happens on old kernels (it already has an explanation for the
> writable case).
With my occasional-API-reviewer hat on, NAK NAK NAK. This is even
more insane than the rest of the O_TMPFILE interface. If you want to
support this kind of use case (which seems entirely reasonable to me),
then just add a syscall, please.
Also, you can, mostly, downgrade from O_RDWR to O_RDONLY. We could
easily add a way to downgrade all the way, I think.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-11-05 15:32 ` Andy Lutomirski
@ 2014-11-05 17:10 ` Linus Torvalds
0 siblings, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2014-11-05 17:10 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Eric Rannaud, Christoph Hellwig, linux-kernel@vger.kernel.org,
Andrew Morton, Al Viro, linux-fsdevel
On Wed, Nov 5, 2014 at 7:32 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> I have no idea whether it's a security risk, but it's a serious "wtf?!?" risk.
Yeah. And considering that the only apparently user of this read-only
O_TMPFILE interface doesn't actually need it anyway because it carries
around the whole "read-only to the client" information independently,
I think the issue is moot. We don't need zero-sized read-only
O_TMPFILE files.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-11-03 18:49 ` Eric Rannaud
2014-11-03 19:04 ` Linus Torvalds
@ 2014-11-03 22:07 ` Jeremy Allison
2014-11-05 8:25 ` Christoph Hellwig
1 sibling, 1 reply; 23+ messages in thread
From: Jeremy Allison @ 2014-11-03 22:07 UTC (permalink / raw)
To: Eric Rannaud
Cc: Andy Lutomirski, Christoph Hellwig, linux-kernel@vger.kernel.org,
Andrew Morton, Al Viro, Linus Torvalds, linux-fsdevel
On Mon, Nov 03, 2014 at 10:49:24AM -0800, Eric Rannaud wrote:
> On Mon, Nov 3, 2014 at 9:06 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> >> That doesn't help because we explicitly reject O_RDONLY when combined
> >> with O_TMPFILE.
> >
> > I think I'm missing something. How is an O_RDONLY temporary file
> > useful? Wouldn't you want an O_RDWR tempfile with mode 0400 or
> > something like that?
>
> Isn't it because they are essentially emulating an atomic open()
> capable of creating a file with inherited ACLs, according to
> relatively complex rules? open *can* be used with O_CREAT|O_RDONLY
> (touch(1) might do that), which would naively translate into:
>
> fd = open(dir, O_TMPFILE|O_RDONLY, 0600)
> fsetxattr(fd, "...")
> fsetxattr(fd, "...")
> linkat(AT_FDCWD, "/proc/self/fd/...", ..., AT_SYMLINK_FOLLOW)
> return fd;
>
> Now this would be happening on the server, and the only reason why it
> would be important to ensure that fd is O_RDONLY, is that smbd does
> not do its own bookkeeping of how each file handle was opened, and
> would rather have the kernel enforce O_RDONLY?
>
> With O_TMPFILE as implemented now, smbd would have to do open(dir,
> O_TMPFILE|O_RDWR, 0600), but internally keep track that O_RDONLY was
> requested by the client on that fd, and block any writes to fd itself.
Which we already do, actually..
Although the atomic open emulation is
a very interesting idea for us. That's
something we currently don't do correctly
across different protocols (although we
do it between smbd's themselves).
Jeremy.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-11-03 22:07 ` Jeremy Allison
@ 2014-11-05 8:25 ` Christoph Hellwig
0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2014-11-05 8:25 UTC (permalink / raw)
To: Jeremy Allison
Cc: Eric Rannaud, Andy Lutomirski, Christoph Hellwig,
linux-kernel@vger.kernel.org, Andrew Morton, Al Viro,
Linus Torvalds, linux-fsdevel
On Mon, Nov 03, 2014 at 02:07:07PM -0800, Jeremy Allison wrote:
> Which we already do, actually..
Hmm, I didn't notice that part when implementing the atomic open,
guess I need to dig deeper into the code.
I still think supporting full open semantics with O_TMFILE + flink is
useful, so that others can make use of it it without such an additional
enforcement layer.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-10-30 22:58 ` Linus Torvalds
2014-10-31 0:01 ` Andy Lutomirski
@ 2014-10-31 0:57 ` Eric Rannaud
2014-10-31 1:04 ` Linus Torvalds
1 sibling, 1 reply; 23+ messages in thread
From: Eric Rannaud @ 2014-10-31 0:57 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andy Lutomirski, Linux Kernel Mailing List, Alexander Viro,
linux-fsdevel, Andrew Morton
On Thu, Oct 30, 2014 at 3:58 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Oct 30, 2014 at 3:48 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> Agreed. Will apply and add the stable cc.
>
> Ho humm. Thinking about this some more, I'm starting to wonder. Not
> about this patch per se (open on a newly created file should indeed
> succeed regardless), but about the horrible glibc behavior of screwing
> up the third argument.
>
> If you want to do O_TMPFILE + linkat() (or some eventual future
> flink()), the mode really matters. So this idiotic glibc behavior of
> only forwarding the third argument if O_CREAT is set seems to be a
> bug.
Yes, there definitely is a glibc bug: a fix is being worked on and it
looks like it will go in. The change replaces the test for O_CREAT by
a test for either O_CREAT or O_TMPFILE.
> Why the hell does glibc think it's a good idea to intersect system
> call semantics? It's not a good idea - it's just stupid in the
> extreme. And in this case it seems to actively breaks things.
What Andy said: it's the most portable, because of the optional last
argument, if you have a bunch of wrappers for open and openat written
in C, which they do.
Some of these wrappers are for statically checking that the mode
argument is present when necessary (using __builtin_constant_p,
__builtin_va_arg_pack, and friends), and to "fortify" the code against
a malicious (?) injection of O_CREAT in a code path that didn't have
it at compile time (where flags was a build constant). Other wrappers
add O_LARGEFILE. One wrapper for openat checks that dirfd is a
directory (no idea why, as the kernel does the same check -- without a
race; maybe to try to guarantee ENOTDIR on some old kernel version?
It's been here since 2005, when openat was added to glibc).
--
Eric Rannaud <e@nanocritical.com>
Nanocritical, CEO
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-10-31 0:57 ` Eric Rannaud
@ 2014-10-31 1:04 ` Linus Torvalds
2014-10-31 2:12 ` Eric Rannaud
0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2014-10-31 1:04 UTC (permalink / raw)
To: Eric Rannaud
Cc: Andy Lutomirski, Linux Kernel Mailing List, Alexander Viro,
linux-fsdevel, Andrew Morton
On Thu, Oct 30, 2014 at 5:57 PM, Eric Rannaud <e@nanocritical.com> wrote:
>
> Yes, there definitely is a glibc bug: a fix is being worked on and it
> looks like it will go in. The change replaces the test for O_CREAT by
> a test for either O_CREAT or O_TMPFILE.
Why not just do it unconditionally? There really is no downside. Doing
it conditionally only makes the generated code slower and mode
complex. For absolutely zero gain, as far as I can tell. Does any
architecture actually do anything wrong?
It's actually closer in spirit to the original "open()" model than the
existing code.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0
2014-10-31 1:04 ` Linus Torvalds
@ 2014-10-31 2:12 ` Eric Rannaud
0 siblings, 0 replies; 23+ messages in thread
From: Eric Rannaud @ 2014-10-31 2:12 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andy Lutomirski, Linux Kernel Mailing List, Alexander Viro,
linux-fsdevel, Andrew Morton
On Thu, Oct 30, 2014 at 6:04 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Oct 30, 2014 at 5:57 PM, Eric Rannaud <e@nanocritical.com> wrote:
>> Yes, there definitely is a glibc bug: a fix is being worked on and it
>> looks like it will go in. The change replaces the test for O_CREAT by
>> a test for either O_CREAT or O_TMPFILE.
>
> Why not just do it unconditionally? There really is no downside. Doing
> it conditionally only makes the generated code slower and mode
> complex. For absolutely zero gain, as far as I can tell. Does any
> architecture actually do anything wrong?
The immediate answer is "to keep the diff minimal". But I'll ask.
--
Eric Rannaud <e@nanocritical.com>
Nanocritical, CEO
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2014-11-05 17:10 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-30 9:08 [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0 Eric Rannaud
2014-10-30 21:48 ` Andy Lutomirski
2014-10-30 22:48 ` Linus Torvalds
2014-10-30 22:58 ` Linus Torvalds
2014-10-31 0:01 ` Andy Lutomirski
2014-10-31 0:59 ` Linus Torvalds
2014-10-31 8:42 ` Christoph Hellwig
2014-10-31 18:44 ` Andy Lutomirski
2014-10-31 18:53 ` Linus Torvalds
2014-11-03 8:34 ` Christoph Hellwig
2014-11-03 17:05 ` Linus Torvalds
2014-11-03 17:06 ` Andy Lutomirski
2014-11-03 18:49 ` Eric Rannaud
2014-11-03 19:04 ` Linus Torvalds
2014-11-05 8:27 ` Christoph Hellwig
2014-11-05 15:21 ` Eric Rannaud
2014-11-05 15:32 ` Andy Lutomirski
2014-11-05 17:10 ` Linus Torvalds
2014-11-03 22:07 ` Jeremy Allison
2014-11-05 8:25 ` Christoph Hellwig
2014-10-31 0:57 ` Eric Rannaud
2014-10-31 1:04 ` Linus Torvalds
2014-10-31 2:12 ` Eric Rannaud
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).