* Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]
[not found] ` <152720691829.9073.10564431140980997005.stgit@warthog.procyon.org.uk>
@ 2018-06-01 6:26 ` Christoph Hellwig
2018-06-01 6:39 ` Al Viro
2018-06-01 8:27 ` David Howells
2018-06-01 8:02 ` [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8] Amir Goldstein
2018-06-01 8:42 ` David Howells
2 siblings, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-06-01 6:26 UTC (permalink / raw)
To: David Howells; +Cc: viro, linux-fsdevel, linux-afs, linux-kernel, linux-api
On Fri, May 25, 2018 at 01:08:38AM +0100, David Howells wrote:
> Make it possible to clone a mount tree with a new pair of open flags that
> are used in conjunction with O_PATH:
>
> (1) O_CLONE_MOUNT - Clone the mount or mount tree at the path.
>
> (2) O_NON_RECURSIVE - Don't clone recursively.
Err. I don't think we should use up two O_* flags for something
only useful for your new mount API. Don't we have a better place
to for these flags?
Instead of overloading this on open having a specific syscalls just
seems like a much saner idea.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]
2018-06-01 6:26 ` [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8] Christoph Hellwig
@ 2018-06-01 6:39 ` Al Viro
2018-06-01 8:27 ` David Howells
1 sibling, 0 replies; 19+ messages in thread
From: Al Viro @ 2018-06-01 6:39 UTC (permalink / raw)
To: Christoph Hellwig
Cc: David Howells, linux-fsdevel, linux-afs, linux-kernel, linux-api
On Thu, May 31, 2018 at 11:26:54PM -0700, Christoph Hellwig wrote:
> On Fri, May 25, 2018 at 01:08:38AM +0100, David Howells wrote:
> > Make it possible to clone a mount tree with a new pair of open flags that
> > are used in conjunction with O_PATH:
> >
> > (1) O_CLONE_MOUNT - Clone the mount or mount tree at the path.
> >
> > (2) O_NON_RECURSIVE - Don't clone recursively.
>
> Err. I don't think we should use up two O_* flags for something
> only useful for your new mount API. Don't we have a better place
> to for these flags?
>
> Instead of overloading this on open having a specific syscalls just
> seems like a much saner idea.
It's not just mount API; these can be used independently of that.
Think of the uses where you pass those to ...at() and you'll see
a bunch of applications of that thing.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]
[not found] ` <152720691829.9073.10564431140980997005.stgit@warthog.procyon.org.uk>
2018-06-01 6:26 ` [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8] Christoph Hellwig
@ 2018-06-01 8:02 ` Amir Goldstein
2018-06-01 8:42 ` David Howells
2 siblings, 0 replies; 19+ messages in thread
From: Amir Goldstein @ 2018-06-01 8:02 UTC (permalink / raw)
To: David Howells; +Cc: Al Viro, linux-fsdevel, linux-afs, linux-kernel, linux-api
[added linux-api]
On Fri, May 25, 2018 at 3:08 AM, David Howells <dhowells@redhat.com> wrote:
> Make it possible to clone a mount tree with a new pair of open flags that
> are used in conjunction with O_PATH:
>
> (1) O_CLONE_MOUNT - Clone the mount or mount tree at the path.
>
> (2) O_NON_RECURSIVE - Don't clone recursively.
>
> Note that it's not a good idea to reuse other flags (such as O_CREAT)
> because the open routine for O_PATH does not give an error if any other
> flags are used in conjunction with O_PATH, but rather just masks off any it
> doesn't use.
>
> The resultant file struct is marked FMODE_NEED_UNMOUNT to as it pins an
> extra reference for the mount. This will be cleared by the upcoming
> move_mount() syscall when it successfully moves a cloned mount into the
> filesystem tree.
>
> Note that care needs to be taken with the error handling in do_o_path() in
> the case that vfs_open() fails as the path may or may not have been
> attached to the file struct and FMODE_NEED_UNMOUNT may or may not be set.
> Note that O_DIRECT | O_PATH could be a problem with error handling too.
>
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
>
[...]
> @@ -977,8 +979,11 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
> * If we have O_PATH in the open flag. Then we
> * cannot have anything other than the below set of flags
> */
> - flags &= O_DIRECTORY | O_NOFOLLOW | O_PATH;
> + flags &= (O_DIRECTORY | O_NOFOLLOW | O_PATH |
> + O_CLONE_MOUNT | O_NON_RECURSIVE);
> acc_mode = 0;
> + } else if (flags & (O_CLONE_MOUNT | O_NON_RECURSIVE)) {
> + return -EINVAL;
Reject O_NON_RECURSIVE without O_CLONE_MOUNT?
That would free at least one flag combination for future use.
Doesn't it make more sense for user API to opt-into
O_RECURSIVE_CLONE, rather than opt-out of it?
> }
>
> op->open_flag = flags;
> diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
> index 27dc7a60693e..8f60e2244740 100644
> --- a/include/linux/fcntl.h
> +++ b/include/linux/fcntl.h
> @@ -9,7 +9,8 @@
> (O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC | \
> O_APPEND | O_NDELAY | O_NONBLOCK | O_NDELAY | __O_SYNC | O_DSYNC | \
> FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \
> - O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE)
> + O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE | \
> + O_CLONE_MOUNT | O_NON_RECURSIVE)
>
> #ifndef force_o_largefile
> #define force_o_largefile() (BITS_PER_LONG != 32)
> diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
> index 0b1c7e35090c..f533e35ea19b 100644
> --- a/include/uapi/asm-generic/fcntl.h
> +++ b/include/uapi/asm-generic/fcntl.h
> @@ -88,6 +88,14 @@
> #define __O_TMPFILE 020000000
> #endif
>
> +#ifndef O_CLONE_MOUNT
> +#define O_CLONE_MOUNT 040000000 /* Used with O_PATH to clone the mount subtree at path */
> +#endif
> +
> +#ifndef O_NON_RECURSIVE
> +#define O_NON_RECURSIVE 0100000000 /* Used with O_CLONE_MOUNT to only clone one mount */
> +#endif
> +
> /* a horrid kludge trying to make sure that this will fail on old kernels */
> #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
> #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT)
>
I am not sure what are the consequences of opening O_PATH with old kernel
and getting an open file, can't think of anything bad.
Can the same be claimed for O_PATH|O_CLONE_MOUNT?
Wouldn't it be better to apply the O_TMPFILE kludge to the new
open flag, so that apps can check if O_CLONE_MOUNT feature is supported
by kernel?
Thanks,
Amir.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]
2018-06-01 6:26 ` [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8] Christoph Hellwig
2018-06-01 6:39 ` Al Viro
@ 2018-06-01 8:27 ` David Howells
2018-06-02 3:09 ` Al Viro
1 sibling, 1 reply; 19+ messages in thread
From: David Howells @ 2018-06-01 8:27 UTC (permalink / raw)
To: Al Viro
Cc: dhowells, Christoph Hellwig, linux-fsdevel, linux-afs,
linux-kernel, linux-api
Al Viro <viro@ZenIV.linux.org.uk> wrote:
> > Instead of overloading this on open having a specific syscalls just
> > seems like a much saner idea.
>
> It's not just mount API; these can be used independently of that.
> Think of the uses where you pass those to ...at() and you'll see
> a bunch of applications of that thing.
I kind of agree with Christoph on this point. Yes, you can use the resultant
fd for other things, but that doesn't mean it has to be obtained initially
through open() or openat() rather than, say, a new pick_mount() syscall.
Further, having more parameters available gives us the opportunity to change
the settings on any mounts we create at the point of creation.
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]
[not found] ` <152720691829.9073.10564431140980997005.stgit@warthog.procyon.org.uk>
2018-06-01 6:26 ` [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8] Christoph Hellwig
2018-06-01 8:02 ` [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8] Amir Goldstein
@ 2018-06-01 8:42 ` David Howells
2 siblings, 0 replies; 19+ messages in thread
From: David Howells @ 2018-06-01 8:42 UTC (permalink / raw)
To: Amir Goldstein
Cc: dhowells, Al Viro, linux-fsdevel, linux-afs, linux-kernel,
linux-api
Amir Goldstein <amir73il@gmail.com> wrote:
> Reject O_NON_RECURSIVE without O_CLONE_MOUNT?
Yes, I should add that.
> I am not sure what are the consequences of opening O_PATH with old kernel
> and getting an open file, can't think of anything bad.
> Can the same be claimed for O_PATH|O_CLONE_MOUNT?
Yes, actually, there can be consequences. Some files have side effects.
Think open("/dev/foobar", O_PATH).
> Wouldn't it be better to apply the O_TMPFILE kludge to the new
> open flag, so that apps can check if O_CLONE_MOUNT feature is supported
> by kernel?
Ugh. The problem is that the O_TMPFILE kludge can't be done because O_PATH
currently just masks off any bits it's not interested in rather than giving an
error.
Even the O_TMPFILE kludge doesn't protect you against someone having set
random unassigned bits when testing on a kernel that didn't support it.
And this bit:
/*
* Clear out all open flags we don't know about so that we don't report
* them in fcntl(F_GETFD) or similar interfaces.
*/
flags &= VALID_OPEN_FLAGS;
is just plain wrong. Effectively, it allows userspace to set random reserved
bits without consequences. It should give an error instead.
Probably we should really replace open() and openat() both before we can
allocate any further open flags.
</grumble>
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]
2018-06-01 8:27 ` David Howells
@ 2018-06-02 3:09 ` Al Viro
2018-06-02 3:42 ` Al Viro
0 siblings, 1 reply; 19+ messages in thread
From: Al Viro @ 2018-06-02 3:09 UTC (permalink / raw)
To: David Howells
Cc: Christoph Hellwig, linux-fsdevel, linux-afs, linux-kernel,
linux-api
On Fri, Jun 01, 2018 at 09:27:43AM +0100, David Howells wrote:
> Al Viro <viro@ZenIV.linux.org.uk> wrote:
>
> > > Instead of overloading this on open having a specific syscalls just
> > > seems like a much saner idea.
> >
> > It's not just mount API; these can be used independently of that.
> > Think of the uses where you pass those to ...at() and you'll see
> > a bunch of applications of that thing.
>
> I kind of agree with Christoph on this point. Yes, you can use the resultant
> fd for other things, but that doesn't mean it has to be obtained initially
> through open() or openat() rather than, say, a new pick_mount() syscall.
>
> Further, having more parameters available gives us the opportunity to change
> the settings on any mounts we create at the point of creation.
open_subtree(int dirfd, const char *pathname, int flags), then? How would
flags be interpreted? What I see mapping at that thing is
* equivalent of O_PATH open
* clone subtree, O_PATH open root
* clone one mount, O_PATH open root
and apparently you want to add (orthogonal to that)
* make shared/slave/private/unbindable
* ditto with recursion?
* same for nodev/nosuid/noexec/noatime/nodiratime/relatime/ro/?
as well as usual AT_... flags (empty path, follow)
Choose the encoding...
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]
2018-06-02 3:09 ` Al Viro
@ 2018-06-02 3:42 ` Al Viro
2018-06-02 4:04 ` Al Viro
2018-06-02 15:45 ` David Howells
0 siblings, 2 replies; 19+ messages in thread
From: Al Viro @ 2018-06-02 3:42 UTC (permalink / raw)
To: David Howells
Cc: Christoph Hellwig, linux-fsdevel, linux-afs, linux-kernel,
linux-api
On Sat, Jun 02, 2018 at 04:09:14AM +0100, Al Viro wrote:
> On Fri, Jun 01, 2018 at 09:27:43AM +0100, David Howells wrote:
> > Al Viro <viro@ZenIV.linux.org.uk> wrote:
> >
> > > > Instead of overloading this on open having a specific syscalls just
> > > > seems like a much saner idea.
> > >
> > > It's not just mount API; these can be used independently of that.
> > > Think of the uses where you pass those to ...at() and you'll see
> > > a bunch of applications of that thing.
> >
> > I kind of agree with Christoph on this point. Yes, you can use the resultant
> > fd for other things, but that doesn't mean it has to be obtained initially
> > through open() or openat() rather than, say, a new pick_mount() syscall.
> >
> > Further, having more parameters available gives us the opportunity to change
> > the settings on any mounts we create at the point of creation.
>
> open_subtree(int dirfd, const char *pathname, int flags), then? How would
> flags be interpreted? What I see mapping at that thing is
> * equivalent of O_PATH open
> * clone subtree, O_PATH open root
> * clone one mount, O_PATH open root
> and apparently you want to add (orthogonal to that)
> * make shared/slave/private/unbindable
> * ditto with recursion?
> * same for nodev/nosuid/noexec/noatime/nodiratime/relatime/ro/?
> as well as usual AT_... flags (empty path, follow)
>
> Choose the encoding...
_If_ I'm interpreting that correctly, that should be something like a bitmap
of attributes to modify + values to set for each. Let's see -
propagation 1 + 2 bits
nodev 1 + 1
noexec 1 + 1
nosuid 1 + 1
ro 1 + 1
atime 1 + 3
That's 15 bits. On top of that, we have 1 bit for "clone or original"
and 1 bit for "recursive or single-mount". As well as AT_EMPTY_PATH,
and AT_NO_AUTOMOUNT (inconvenient, since these are fixed bits). In
principle, that does fit into int, with some space to spare...
Is that what you have in mind?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]
2018-06-02 3:42 ` Al Viro
@ 2018-06-02 4:04 ` Al Viro
2018-06-02 15:45 ` David Howells
1 sibling, 0 replies; 19+ messages in thread
From: Al Viro @ 2018-06-02 4:04 UTC (permalink / raw)
To: David Howells
Cc: Christoph Hellwig, linux-fsdevel, linux-afs, linux-kernel,
linux-api
On Sat, Jun 02, 2018 at 04:42:56AM +0100, Al Viro wrote:
> _If_ I'm interpreting that correctly, that should be something like a bitmap
> of attributes to modify + values to set for each. Let's see -
> propagation 1 + 2 bits
> nodev 1 + 1
> noexec 1 + 1
> nosuid 1 + 1
> ro 1 + 1
> atime 1 + 3
> That's 15 bits. On top of that, we have 1 bit for "clone or original"
> and 1 bit for "recursive or single-mount". As well as AT_EMPTY_PATH,
> and AT_NO_AUTOMOUNT (inconvenient, since these are fixed bits). In
> principle, that does fit into int, with some space to spare...
>
> Is that what you have in mind?
TBH, I would probably prefer separate mount_setattr(2) for that kind
of work, with something like
int mount_setattr(int dirfd, const char *path, int flags, int attr)
*not* opening any files.
flags:
AT_EMPTY_PATH, AT_NO_AUTOMOUNT, AT_RECURSIVE
attr:
MOUNT_SETATTR_DEV (1<<0)
MOUNT_SETATTR_NODEV (1<<0)|(1<<1)
MOUNT_SETATTR_EXEC (1<<2)
MOUNT_SETATTR_NOEXEC (1<<2)|(1<<3)
MOUNT_SETATTR_SUID (1<<4)
MOUNT_SETATTR_NOSUID (1<<4)|(1<<5)
MOUNT_SETATTR_RW (1<<6)
MOUNT_SETATTR_RO (1<<6)|(1<<7)
MOUNT_SETATTR_RELATIME (1<<8)
MOUNT_SETATTR_NOATIME (1<<8)|(1<<9)
MOUNT_SETATTR_NODIRATIME (1<<8)|(2<<9)
MOUNT_SETATTR_STRICTATIME (1<<8)|(3<<9)
MOUNT_SETATTR_PRIVATE (1<<11)
MOUNT_SETATTR_SHARED (1<<11)|(1<<12)
MOUNT_SETATTR_SLAVE (1<<11)|(2<<12)
MOUNT_SETATTR_UNBINDABLE (1<<11)|(3<<12)
With either openat() used as in this series, or explicit
int open_tree(int dirfd, const char *path, int flags)
returning a descriptor, with flags being
AT_EMPTY_PATH, AT_NO_AUTOMOUNT, AT_RECURSIVE, AT_CLONE
with AT_RECURSIVE without AT_CLONE being an error. Hell, might
even add AT_UMOUNT for "open root and detach, to be dissolved on
close", incompatible with AT_CLONE.
Comments?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]
2018-06-02 3:42 ` Al Viro
2018-06-02 4:04 ` Al Viro
@ 2018-06-02 15:45 ` David Howells
2018-06-02 17:49 ` Al Viro
1 sibling, 1 reply; 19+ messages in thread
From: David Howells @ 2018-06-02 15:45 UTC (permalink / raw)
To: Al Viro
Cc: dhowells, Christoph Hellwig, linux-fsdevel, linux-afs,
linux-kernel, linux-api
Al Viro <viro@ZenIV.linux.org.uk> wrote:
> TBH, I would probably prefer separate mount_setattr(2) for that kind
> of work, with something like
> int mount_setattr(int dirfd, const char *path, int flags, int attr)
> *not* opening any files.
> flags:
> AT_EMPTY_PATH, AT_NO_AUTOMOUNT, AT_RECURSIVE
I would call these MOUNT_SETATTR_* rather than AT_*.
> attr:
> MOUNT_SETATTR_DEV (1<<0)
> MOUNT_SETATTR_NODEV (1<<0)|(1<<1)
> MOUNT_SETATTR_EXEC (1<<2)
> MOUNT_SETATTR_NOEXEC (1<<2)|(1<<3)
> MOUNT_SETATTR_SUID (1<<4)
> MOUNT_SETATTR_NOSUID (1<<4)|(1<<5)
> MOUNT_SETATTR_RW (1<<6)
> MOUNT_SETATTR_RO (1<<6)|(1<<7)
> MOUNT_SETATTR_RELATIME (1<<8)
> MOUNT_SETATTR_NOATIME (1<<8)|(1<<9)
> MOUNT_SETATTR_NODIRATIME (1<<8)|(2<<9)
> MOUNT_SETATTR_STRICTATIME (1<<8)|(3<<9)
> MOUNT_SETATTR_PRIVATE (1<<11)
> MOUNT_SETATTR_SHARED (1<<11)|(1<<12)
> MOUNT_SETATTR_SLAVE (1<<11)|(2<<12)
> MOUNT_SETATTR_UNBINDABLE (1<<11)|(3<<12)
So, I like this generally, some notes though:
I wonder if this should be two separate parameters, a mask and the settings?
I'm not sure that's worth it since some of the mask bits would cover multiple
settings.
Also, should NODIRATIME be separate from the other *ATIME flags? I do also
like treating some of these settings as enumerations rather than a set of
bits.
I would make the prototype:
int mount_setattr(int dirfd, const char *path,
unsigned int flags, unsigned int attr,
void *reserved5);
Further, do we want to say you can either change the propagation type *or*
reconfigure the mountpoint restrictions, but not both at the same time?
> With either openat() used as in this series, or explicit
> int open_tree(int dirfd, const char *path, int flags)
Maybe open_mount(), grab_mount() or pick_mount()?
I wonder if fsopen()/fspick() should be create_fs()/open_fs()...
> returning a descriptor, with flags being
> AT_EMPTY_PATH, AT_NO_AUTOMOUNT, AT_RECURSIVE, AT_CLONE
> with AT_RECURSIVE without AT_CLONE being an error.
You also need an O_CLOEXEC equivalent.
I would make it:
OPEN_TREE_CLOEXEC 0x00000001
OPEN_TREE_EMPTY_PATH 0x00000002
OPEN_TREE_FOLLOW_SYMLINK 0x00000004
OPEN_TREE_NO_AUTOMOUNT 0x00000008
OPEN_TREE_CLONE 0x00000010
OPEN_TREE_RECURSIVE 0x00000020
adding the follow-symlinks so that you don't grab a symlink target by
accident. (Can you actually mount on top of a symlink?)
> Hell, might even add AT_UMOUNT for "open root and detach, to be dissolved on
> close", incompatible with AT_CLONE.
Cute. Guess you could do:
fd = open_mount(..., OPEN_TREE_DETACH);
mount_setattr(fd, "",
MOUNT_SETATTR_EMPTY_PATH,
MOUNT_SETATTR_NOSUID, NULL);
move_mount(fd, "", ...);
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]
2018-06-02 15:45 ` David Howells
@ 2018-06-02 17:49 ` Al Viro
2018-06-03 0:55 ` [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]) Al Viro
0 siblings, 1 reply; 19+ messages in thread
From: Al Viro @ 2018-06-02 17:49 UTC (permalink / raw)
To: David Howells
Cc: Christoph Hellwig, linux-fsdevel, linux-afs, linux-kernel,
linux-api
On Sat, Jun 02, 2018 at 04:45:21PM +0100, David Howells wrote:
> Al Viro <viro@ZenIV.linux.org.uk> wrote:
>
> > TBH, I would probably prefer separate mount_setattr(2) for that kind
> > of work, with something like
> > int mount_setattr(int dirfd, const char *path, int flags, int attr)
> > *not* opening any files.
> > flags:
> > AT_EMPTY_PATH, AT_NO_AUTOMOUNT, AT_RECURSIVE
>
> I would call these MOUNT_SETATTR_* rather than AT_*.
Why? AT_EMPTY_PATH/AT_NO_AUTOMOUNT are common with other ...at()
syscalls; AT_RECURSIVE - maybe, but it's still more like AT_...
namespace fodder, IMO.
> > attr:
> > MOUNT_SETATTR_DEV (1<<0)
> > MOUNT_SETATTR_NODEV (1<<0)|(1<<1)
> > MOUNT_SETATTR_EXEC (1<<2)
> > MOUNT_SETATTR_NOEXEC (1<<2)|(1<<3)
> > MOUNT_SETATTR_SUID (1<<4)
> > MOUNT_SETATTR_NOSUID (1<<4)|(1<<5)
> > MOUNT_SETATTR_RW (1<<6)
> > MOUNT_SETATTR_RO (1<<6)|(1<<7)
> > MOUNT_SETATTR_RELATIME (1<<8)
> > MOUNT_SETATTR_NOATIME (1<<8)|(1<<9)
> > MOUNT_SETATTR_NODIRATIME (1<<8)|(2<<9)
> > MOUNT_SETATTR_STRICTATIME (1<<8)|(3<<9)
> > MOUNT_SETATTR_PRIVATE (1<<11)
> > MOUNT_SETATTR_SHARED (1<<11)|(1<<12)
> > MOUNT_SETATTR_SLAVE (1<<11)|(2<<12)
> > MOUNT_SETATTR_UNBINDABLE (1<<11)|(3<<12)
>
> So, I like this generally, some notes though:
>
> I wonder if this should be two separate parameters, a mask and the settings?
> I'm not sure that's worth it since some of the mask bits would cover multiple
> settings.
Nah, better put those bits in the same word, as in above. Here bits 0, 2, 4, 6,
8 and 11 tell which attributes are to be modified, with values to set living
in bits 1, 3, 5, 7, 9--10 and 12--13. Look at the constants above..
> Also, should NODIRATIME be separate from the other *ATIME flags? I do also
> like treating some of these settings as enumerations rather than a set of
> bits.
Huh? That's precisely what I'm doing there: bit 8 is "want to change atime
settings", bits 9 and 10 hold a 4-element enumeration (rel/no/nodir/strict).
Similar for propagation settings (bit 11 indicates that we want to set those,
bits 12 and 13 - 4-element enum)...
> I would make the prototype:
>
> int mount_setattr(int dirfd, const char *path,
> unsigned int flags, unsigned int attr,
> void *reserved5);
>
> Further, do we want to say you can either change the propagation type *or*
> reconfigure the mountpoint restrictions, but not both at the same time?
Why? MOUNT_SETATTR_PRIVATE | MOUNT_NOATIME | MOUNT_SUID, i.e. 00101100010000,
i.e. 0xb10 for "turn nosuid off, switch atime polcy to noatime, change propagation
to private, leave everything else as-is"...
And for fsck sake, what's that "void *reserved5" for?
> > With either openat() used as in this series, or explicit
> > int open_tree(int dirfd, const char *path, int flags)
>
> Maybe open_mount(), grab_mount() or pick_mount()?
>
> I wonder if fsopen()/fspick() should be create_fs()/open_fs()...
>
> > returning a descriptor, with flags being
> > AT_EMPTY_PATH, AT_NO_AUTOMOUNT, AT_RECURSIVE, AT_CLONE
> > with AT_RECURSIVE without AT_CLONE being an error.
>
> You also need an O_CLOEXEC equivalent.
Point.
> I would make it:
>
> OPEN_TREE_CLOEXEC 0x00000001
Why not O_CLOEXEC, as with epoll_create()/timerfd_create()/etc.?
> OPEN_TREE_EMPTY_PATH 0x00000002
> OPEN_TREE_FOLLOW_SYMLINK 0x00000004
> OPEN_TREE_NO_AUTOMOUNT 0x00000008
Why? How are those different from normal AT_EMPTY_PATH/AT_NO_AUTOMOUNT?
> OPEN_TREE_CLONE 0x00000010
> OPEN_TREE_RECURSIVE 0x00000020
>
> adding the follow-symlinks so that you don't grab a symlink target by
> accident. (Can you actually mount on top of a symlink?)
You can't - mount(2) uses LOOKUP_FOLLOW for mountpoint (well, user_path(),
actually).
> > Hell, might even add AT_UMOUNT for "open root and detach, to be dissolved on
> > close", incompatible with AT_CLONE.
>
> Cute. Guess you could do:
>
> fd = open_mount(..., OPEN_TREE_DETACH);
> mount_setattr(fd, "",
> MOUNT_SETATTR_EMPTY_PATH,
> MOUNT_SETATTR_NOSUID, NULL);
> move_mount(fd, "", ...);
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])
2018-06-02 17:49 ` Al Viro
@ 2018-06-03 0:55 ` Al Viro
2018-06-04 10:34 ` Miklos Szeredi
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Al Viro @ 2018-06-03 0:55 UTC (permalink / raw)
To: David Howells
Cc: Christoph Hellwig, linux-fsdevel, linux-afs, linux-kernel,
linux-api
On Sat, Jun 02, 2018 at 06:49:58PM +0100, Al Viro wrote:
> > > Hell, might even add AT_UMOUNT for "open root and detach, to be dissolved on
> > > close", incompatible with AT_CLONE.
> >
> > Cute. Guess you could do:
> >
> > fd = open_mount(..., OPEN_TREE_DETACH);
> > mount_setattr(fd, "",
> > MOUNT_SETATTR_EMPTY_PATH,
> > MOUNT_SETATTR_NOSUID, NULL);
> > move_mount(fd, "", ...);
Hadn't added that yet, but as for the rest of open_tree() - see
vfs.git#mount-open_tree. open() and its flags are not touched at all.
Other changes compared to the previous:
* may_mount() is required for OPEN_TREE_CLONE
* sys_ni.c cruft is dropped - those make no sense until and unless
those syscalls become conditional.
Appears to work, combined with move_mount() it yields eqiuvalents of
mount --{move,bind,rbind}. Combined with mount_setattr(2) (when that
gets added) we'll get mount -o remount,bind,nodev et.al.
(including the currently absent whole-subtree versions) and
mount --make-{r,}{shared,slave,private,unbindable}
It also can be used to get an isolated subtree usable for ....at()
stuff.
The addition of syscall itself is done by the following and I'd really
like linux-abi folks to comment on that puppy
commit 6cfba4dd99b10278c2156c8d4fced2eddedf167f
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Sat Jun 2 19:42:22 2018 -0400
new syscall: open_tree(2)
open_tree(dfd, pathname, flags)
Returns an O_PATH-opened file descriptor or an error.
dfd and pathname specify the location to open, in usual
fashion (see e.g. fstatat(2)). flags should be an OR of
some of the following:
* AT_PATH_EMPTY, AT_NO_AUTOMOUNT, AT_SYMLINK_NOFOLLOW -
same meanings as usual
* OPEN_TREE_CLOEXEC - make the resulting descriptor
close-on-exec
* OPEN_TREE_CLONE or OPEN_TREE_CLONE | AT_RECURSIVE -
instead of opening the location in question, create a detached
mount tree matching the subtree rooted at location specified by
dfd/pathname. With AT_RECURSIVE the entire subtree is cloned,
without it - only the part within in the mount containing the
location in question. In other words, the same as mount --rbind
or mount --bind would've taken. The detached tree will be
dissolved on the final close of obtained file. Creation of such
detached trees requires the same capabilities as doing mount --bind.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 14a2f996e543..b2b44ecd2b17 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -397,3 +397,4 @@
383 i386 statx sys_statx __ia32_sys_statx
384 i386 arch_prctl sys_arch_prctl __ia32_compat_sys_arch_prctl
385 i386 io_pgetevents sys_io_pgetevents __ia32_compat_sys_io_pgetevents
+391 i386 open_tree sys_open_tree __ia32_sys_open_tree
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index cd36232ab62f..d6f4949378e7 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -342,6 +342,7 @@
331 common pkey_free __x64_sys_pkey_free
332 common statx __x64_sys_statx
333 common io_pgetevents __x64_sys_io_pgetevents
+339 common open_tree __x64_sys_open_tree
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/file_table.c b/fs/file_table.c
index 7ec0b3e5f05d..7480271a0d21 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -189,6 +189,7 @@ static void __fput(struct file *file)
struct dentry *dentry = file->f_path.dentry;
struct vfsmount *mnt = file->f_path.mnt;
struct inode *inode = file->f_inode;
+ fmode_t mode = file->f_mode;
might_sleep();
@@ -209,14 +210,14 @@ static void __fput(struct file *file)
file->f_op->release(inode, file);
security_file_free(file);
if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
- !(file->f_mode & FMODE_PATH))) {
+ !(mode & FMODE_PATH))) {
cdev_put(inode->i_cdev);
}
fops_put(file->f_op);
put_pid(file->f_owner.pid);
- if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
+ if ((mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
i_readcount_dec(inode);
- if (file->f_mode & FMODE_WRITER) {
+ if (mode & FMODE_WRITER) {
put_write_access(inode);
__mnt_drop_write(mnt);
}
@@ -224,6 +225,8 @@ static void __fput(struct file *file)
file->f_path.mnt = NULL;
file->f_inode = NULL;
file_free(file);
+ if (unlikely(mode & FMODE_NEED_UNMOUNT))
+ dissolve_on_fput(mnt);
dput(dentry);
mntput(mnt);
}
diff --git a/fs/internal.h b/fs/internal.h
index 980d005b21b4..b55575b9b55c 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -85,6 +85,7 @@ extern void __mnt_drop_write(struct vfsmount *);
extern void __mnt_drop_write_file(struct file *);
extern void mnt_drop_write_file_path(struct file *);
+extern void dissolve_on_fput(struct vfsmount *);
/*
* fs_struct.c
*/
diff --git a/fs/namespace.c b/fs/namespace.c
index 5f75969adff1..3281fea98cf0 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -20,12 +20,14 @@
#include <linux/init.h> /* init_rootfs */
#include <linux/fs_struct.h> /* get_fs_root et.al. */
#include <linux/fsnotify.h> /* fsnotify_vfsmount_delete */
+#include <linux/file.h>
#include <linux/uaccess.h>
#include <linux/proc_ns.h>
#include <linux/magic.h>
#include <linux/bootmem.h>
#include <linux/task_work.h>
#include <linux/sched/task.h>
+#include <uapi/linux/mount.h>
#include "pnode.h"
#include "internal.h"
@@ -1839,6 +1841,16 @@ struct vfsmount *collect_mounts(const struct path *path)
return &tree->mnt;
}
+void dissolve_on_fput(struct vfsmount *mnt)
+{
+ namespace_lock();
+ lock_mount_hash();
+ mntget(mnt);
+ umount_tree(real_mount(mnt), UMOUNT_SYNC);
+ unlock_mount_hash();
+ namespace_unlock();
+}
+
void drop_collected_mounts(struct vfsmount *mnt)
{
namespace_lock();
@@ -2198,6 +2210,30 @@ static bool has_locked_children(struct mount *mnt, struct dentry *dentry)
return false;
}
+static struct mount *__do_loopback(struct path *old_path, int recurse)
+{
+ struct mount *mnt = ERR_PTR(-EINVAL), *old = real_mount(old_path->mnt);
+
+ if (IS_MNT_UNBINDABLE(old))
+ return mnt;
+
+ if (!check_mnt(old) && old_path->dentry->d_op != &ns_dentry_operations)
+ return mnt;
+
+ if (!recurse && has_locked_children(old, old_path->dentry))
+ return mnt;
+
+ if (recurse)
+ mnt = copy_tree(old, old_path->dentry, CL_COPY_MNT_NS_FILE);
+ else
+ mnt = clone_mnt(old, old_path->dentry, 0);
+
+ if (!IS_ERR(mnt))
+ mnt->mnt.mnt_flags &= ~MNT_LOCKED;
+
+ return mnt;
+}
+
/*
* do loopback mount.
*/
@@ -2205,7 +2241,7 @@ static int do_loopback(struct path *path, const char *old_name,
int recurse)
{
struct path old_path;
- struct mount *mnt = NULL, *old, *parent;
+ struct mount *mnt = NULL, *parent;
struct mountpoint *mp;
int err;
if (!old_name || !*old_name)
@@ -2219,38 +2255,21 @@ static int do_loopback(struct path *path, const char *old_name,
goto out;
mp = lock_mount(path);
- err = PTR_ERR(mp);
- if (IS_ERR(mp))
+ if (IS_ERR(mp)) {
+ err = PTR_ERR(mp);
goto out;
+ }
- old = real_mount(old_path.mnt);
parent = real_mount(path->mnt);
-
- err = -EINVAL;
- if (IS_MNT_UNBINDABLE(old))
- goto out2;
-
if (!check_mnt(parent))
goto out2;
- if (!check_mnt(old) && old_path.dentry->d_op != &ns_dentry_operations)
- goto out2;
-
- if (!recurse && has_locked_children(old, old_path.dentry))
- goto out2;
-
- if (recurse)
- mnt = copy_tree(old, old_path.dentry, CL_COPY_MNT_NS_FILE);
- else
- mnt = clone_mnt(old, old_path.dentry, 0);
-
+ mnt = __do_loopback(&old_path, recurse);
if (IS_ERR(mnt)) {
err = PTR_ERR(mnt);
goto out2;
}
- mnt->mnt.mnt_flags &= ~MNT_LOCKED;
-
err = graft_tree(mnt, parent, mp);
if (err) {
lock_mount_hash();
@@ -2264,6 +2283,75 @@ static int do_loopback(struct path *path, const char *old_name,
return err;
}
+SYSCALL_DEFINE3(open_tree, int, dfd, const char *, filename, unsigned, flags)
+{
+ struct file *file;
+ struct path path;
+ int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW;
+ bool detached = flags & OPEN_TREE_CLONE;
+ int error;
+ int fd;
+
+ BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
+
+ if (flags & ~(AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE |
+ AT_SYMLINK_NOFOLLOW | OPEN_TREE_CLONE |
+ OPEN_TREE_CLOEXEC))
+ return -EINVAL;
+
+ if ((flags & (AT_RECURSIVE | OPEN_TREE_CLONE)) == AT_RECURSIVE)
+ return -EINVAL;
+
+ if (flags & AT_NO_AUTOMOUNT)
+ lookup_flags &= ~LOOKUP_AUTOMOUNT;
+ if (flags & AT_SYMLINK_NOFOLLOW)
+ lookup_flags &= ~LOOKUP_FOLLOW;
+ if (flags & AT_EMPTY_PATH)
+ lookup_flags |= LOOKUP_EMPTY;
+
+ if (detached && !may_mount())
+ return -EPERM;
+
+ fd = get_unused_fd_flags(flags & O_CLOEXEC);
+ if (fd < 0)
+ return fd;
+
+ error = user_path_at(dfd, filename, lookup_flags, &path);
+ if (error)
+ goto out;
+
+ if (detached) {
+ struct mount *mnt = __do_loopback(&path, flags & AT_RECURSIVE);
+ if (IS_ERR(mnt)) {
+ error = PTR_ERR(mnt);
+ goto out2;
+ }
+ mntput(path.mnt);
+ path.mnt = &mnt->mnt;
+ }
+
+ file = dentry_open(&path, O_PATH, current_cred());
+ if (IS_ERR(file)) {
+ error = PTR_ERR(file);
+ goto out3;
+ }
+
+ if (detached)
+ file->f_mode |= FMODE_NEED_UNMOUNT;
+ path_put(&path);
+ fd_install(fd, file);
+ return fd;
+
+out3:
+ if (detached)
+ dissolve_on_fput(path.mnt);
+out2:
+ path_put(&path);
+out:
+ put_unused_fd(fd);
+ return error;
+}
+
static int change_mount_flags(struct vfsmount *mnt, int ms_flags)
{
int error = 0;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 482563fe549c..706b4605bc26 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -154,6 +154,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
/* File is capable of returning -EAGAIN if I/O will block */
#define FMODE_NOWAIT ((__force fmode_t)0x8000000)
+/* File represents mount that needs unmounting */
+#define FMODE_NEED_UNMOUNT ((__force fmode_t)0x10000000)
+
/*
* Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
* that indicates that they should check the contents of the iovec are
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 811172fcb916..925483aba03a 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -896,7 +896,7 @@ asmlinkage long sys_pkey_alloc(unsigned long flags, unsigned long init_val);
asmlinkage long sys_pkey_free(int pkey);
asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
unsigned mask, struct statx __user *buffer);
-
+asmlinkage long sys_open_tree(int dfd, const char __user *path, unsigned flags);
/*
* Architecture-specific system calls
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 6448cdd9a350..594b85f7cb86 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -90,5 +90,7 @@
#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */
#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */
+#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
+
#endif /* _UAPI_LINUX_FCNTL_H */
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
new file mode 100644
index 000000000000..b9c3b46210db
--- /dev/null
+++ b/include/uapi/linux/mount.h
@@ -0,0 +1,7 @@
+#ifndef _UAPI_LINUX_MOUNT_H
+#define _UAPI_LINUX_MOUNT_H
+
+#define OPEN_TREE_CLONE 1
+#define OPEN_TREE_CLOEXEC O_CLOEXEC
+
+#endif /* _UAPI_LINUX_MOUNT_H */
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])
2018-06-03 0:55 ` [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]) Al Viro
@ 2018-06-04 10:34 ` Miklos Szeredi
2018-06-04 15:52 ` Al Viro
2018-06-04 15:27 ` David Howells
2018-06-04 17:16 ` Matthew Wilcox
2 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2018-06-04 10:34 UTC (permalink / raw)
To: Al Viro
Cc: David Howells, Christoph Hellwig, linux-fsdevel, linux-afs, LKML,
Linux API
On Sun, Jun 3, 2018 at 2:55 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Sat, Jun 02, 2018 at 06:49:58PM +0100, Al Viro wrote:
>
>> > > Hell, might even add AT_UMOUNT for "open root and detach, to be dissolved on
>> > > close", incompatible with AT_CLONE.
>> >
>> > Cute. Guess you could do:
>> >
>> > fd = open_mount(..., OPEN_TREE_DETACH);
>> > mount_setattr(fd, "",
>> > MOUNT_SETATTR_EMPTY_PATH,
>> > MOUNT_SETATTR_NOSUID, NULL);
>> > move_mount(fd, "", ...);
>
> Hadn't added that yet, but as for the rest of open_tree() - see
> vfs.git#mount-open_tree. open() and its flags are not touched at all.
> Other changes compared to the previous:
> * may_mount() is required for OPEN_TREE_CLONE
> * sys_ni.c cruft is dropped - those make no sense until and unless
> those syscalls become conditional.
>
> Appears to work, combined with move_mount() it yields eqiuvalents of
> mount --{move,bind,rbind}. Combined with mount_setattr(2) (when that
> gets added) we'll get mount -o remount,bind,nodev et.al.
fsopen = create fsfd
fsmount = fsfd -> mountfd & set attr on mountfd & attach mountfd
fspick = path -> fsfd
move_mount = attach mountfd or move existing
fsinfo = info from path
open_tree = new mountfd from path or clone
mount_setattr = set attr on mountfd
Notice that fsmount() encompasses mount_setattr() + move_mount()
functionality. Split those out and leave fsmount() to actually do
the "fsfd ->mountfd" translation?
fsinfo() name suggests it's in the same class as
fsopen/fsmount/fspick, operating on fsfd object, but's it's not and I
think that's slightly confusing.
Rename move_mount() -> mount_move()?
Also does it make sense to make the cloning behavior of open_tree()
optional? Without cloning it's just a plain open(O_PATH). That way
it could be renamed mount_clone().
Thanks,
Miklos
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])
2018-06-03 0:55 ` [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]) Al Viro
2018-06-04 10:34 ` Miklos Szeredi
@ 2018-06-04 15:27 ` David Howells
2018-06-04 17:16 ` Matthew Wilcox
2 siblings, 0 replies; 19+ messages in thread
From: David Howells @ 2018-06-04 15:27 UTC (permalink / raw)
To: Miklos Szeredi
Cc: dhowells, Al Viro, Christoph Hellwig, linux-fsdevel, linux-afs,
LKML, Linux API
Miklos Szeredi <miklos@szeredi.hu> wrote:
> fsinfo = info from path
Actually, I was thinking of making fsinfo() detect if it's been given an fsfd
and go through an fs_context operation instead in that case.
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])
2018-06-04 10:34 ` Miklos Szeredi
@ 2018-06-04 15:52 ` Al Viro
2018-06-04 15:59 ` Al Viro
2018-06-04 19:27 ` Miklos Szeredi
0 siblings, 2 replies; 19+ messages in thread
From: Al Viro @ 2018-06-04 15:52 UTC (permalink / raw)
To: Miklos Szeredi
Cc: David Howells, Christoph Hellwig, linux-fsdevel, linux-afs, LKML,
Linux API
On Mon, Jun 04, 2018 at 12:34:44PM +0200, Miklos Szeredi wrote:
> fsopen = create fsfd
> fsmount = fsfd -> mountfd & set attr on mountfd & attach mountfd
> fspick = path -> fsfd
> move_mount = attach mountfd or move existing
> fsinfo = info from path
> open_tree = new mountfd from path or clone
> mount_setattr = set attr on mountfd
>
> Notice that fsmount() encompasses mount_setattr() + move_mount()
> functionality. Split those out and leave fsmount() to actually do
> the "fsfd ->mountfd" translation?
Might make sense.
> fsinfo() name suggests it's in the same class as
> fsopen/fsmount/fspick, operating on fsfd object, but's it's not and I
> think that's slightly confusing.
>
> Rename move_mount() -> mount_move()?
mount_move_bikeshed_bikeshed_bikeshed(), surely?
> Also does it make sense to make the cloning behavior of open_tree()
> optional? Without cloning it's just a plain open(O_PATH). That way
> it could be renamed mount_clone().
Umm... I'm not sure about that one. If nothing else, OPEN_TREE_DETACH
might be a good idea, in which case cloning is not the primary effect;
hell knows.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])
2018-06-04 15:52 ` Al Viro
@ 2018-06-04 15:59 ` Al Viro
2018-06-04 19:27 ` Miklos Szeredi
1 sibling, 0 replies; 19+ messages in thread
From: Al Viro @ 2018-06-04 15:59 UTC (permalink / raw)
To: Miklos Szeredi
Cc: David Howells, Christoph Hellwig, linux-fsdevel, linux-afs, LKML,
Linux API
On Mon, Jun 04, 2018 at 04:52:05PM +0100, Al Viro wrote:
> On Mon, Jun 04, 2018 at 12:34:44PM +0200, Miklos Szeredi wrote:
>
> > fsopen = create fsfd
> > fsmount = fsfd -> mountfd & set attr on mountfd & attach mountfd
> > fspick = path -> fsfd
> > move_mount = attach mountfd or move existing
> > fsinfo = info from path
> > open_tree = new mountfd from path or clone
> > mount_setattr = set attr on mountfd
> >
> > Notice that fsmount() encompasses mount_setattr() + move_mount()
> > functionality. Split those out and leave fsmount() to actually do
> > the "fsfd ->mountfd" translation?
>
> Might make sense.
FWIW, to make it clear: fsmount(2) in this series actually does *NOT*
attach it to the tree. Commit message definitely needs updating - as it
is, it's
+SYSCALL_DEFINE5(fsmount, int, fs_fd, unsigned int, flags, unsigned int, ms_flags,
+ void *, reserved4, void *, reserved5)
PS: IMO these reserved... arguments are in bad taste; if anyone has good reasons
for that practice in ABI design, I'd like to hear those.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])
2018-06-03 0:55 ` [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]) Al Viro
2018-06-04 10:34 ` Miklos Szeredi
2018-06-04 15:27 ` David Howells
@ 2018-06-04 17:16 ` Matthew Wilcox
2018-06-04 17:35 ` Al Viro
2 siblings, 1 reply; 19+ messages in thread
From: Matthew Wilcox @ 2018-06-04 17:16 UTC (permalink / raw)
To: Al Viro
Cc: David Howells, Christoph Hellwig, linux-fsdevel, linux-afs,
linux-kernel, linux-api
On Sun, Jun 03, 2018 at 01:55:37AM +0100, Al Viro wrote:
> +SYSCALL_DEFINE3(open_tree, int, dfd, const char *, filename, unsigned, flags)
> +{
> + struct file *file;
> + struct path path;
> + int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW;
> + bool detached = flags & OPEN_TREE_CLONE;
> + int error;
> + int fd;
> +
> + BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
Why do we need OPEN_TREE_CLOEXEC? Wouldn't we be better off just making
the fd returned by open_tree implicitly close-on-exec? I can think of
no good reason for these file descriptors to be inherited across exec()
and if someone comes up with such a reason, fcntl(F_SETFD) is not an
expensive call to make.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])
2018-06-04 17:16 ` Matthew Wilcox
@ 2018-06-04 17:35 ` Al Viro
2018-06-04 19:38 ` Miklos Szeredi
0 siblings, 1 reply; 19+ messages in thread
From: Al Viro @ 2018-06-04 17:35 UTC (permalink / raw)
To: Matthew Wilcox
Cc: David Howells, Christoph Hellwig, linux-fsdevel, linux-afs,
linux-kernel, linux-api
On Mon, Jun 04, 2018 at 10:16:30AM -0700, Matthew Wilcox wrote:
> On Sun, Jun 03, 2018 at 01:55:37AM +0100, Al Viro wrote:
> > +SYSCALL_DEFINE3(open_tree, int, dfd, const char *, filename, unsigned, flags)
> > +{
> > + struct file *file;
> > + struct path path;
> > + int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW;
> > + bool detached = flags & OPEN_TREE_CLONE;
> > + int error;
> > + int fd;
> > +
> > + BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
>
> Why do we need OPEN_TREE_CLOEXEC? Wouldn't we be better off just making
> the fd returned by open_tree implicitly close-on-exec? I can think of
> no good reason for these file descriptors to be inherited across exec()
How are they different from any file descriptor? It's not as if it was
something usable only for mounting stuff - again, you can use them
with any ...at() syscalls.
> and if someone comes up with such a reason, fcntl(F_SETFD) is not an
> expensive call to make.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])
2018-06-04 15:52 ` Al Viro
2018-06-04 15:59 ` Al Viro
@ 2018-06-04 19:27 ` Miklos Szeredi
1 sibling, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2018-06-04 19:27 UTC (permalink / raw)
To: Al Viro
Cc: David Howells, Christoph Hellwig, linux-fsdevel, linux-afs, LKML,
Linux API
On Mon, Jun 4, 2018 at 5:52 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Mon, Jun 04, 2018 at 12:34:44PM +0200, Miklos Szeredi wrote:
>
>> fsopen = create fsfd
>> fsmount = fsfd -> mountfd & set attr on mountfd & attach mountfd
>> fspick = path -> fsfd
>> move_mount = attach mountfd or move existing
>> fsinfo = info from path
>> open_tree = new mountfd from path or clone
>> mount_setattr = set attr on mountfd
>>
>> Notice that fsmount() encompasses mount_setattr() + move_mount()
>> functionality. Split those out and leave fsmount() to actually do
>> the "fsfd ->mountfd" translation?
>
> Might make sense.
> FWIW, to make it clear: fsmount(2) in this series actually does *NOT*
> attach it to the tree.
Ah, that leaves the mount_setattr() functionality to split out. I'd
be more happy to rid this new API of all the old MS_* crap and have
have a new set of attributes, that just apply to mounts. It will
also need two args: a bitmap of new attributes and a mask to tell us
which attributes to change.
> Commit message definitely needs updating - as it
> is, it's
>
> +SYSCALL_DEFINE5(fsmount, int, fs_fd, unsigned int, flags, unsigned int, ms_flags,
> + void *, reserved4, void *, reserved5)
>
> PS: IMO these reserved... arguments are in bad taste; if anyone has good reasons
> for that practice in ABI design, I'd like to hear those.
Agreed. A flags argument is often wise to add even if currently
unused (and should be checked for undefined flags), but adding a
random number of pointers doesn't seem to make a lot of sense.
>
>> fsinfo() name suggests it's in the same class as
>> fsopen/fsmount/fspick, operating on fsfd object, but's it's not and I
>> think that's slightly confusing.
>>
>> Rename move_mount() -> mount_move()?
>
> mount_move_bikeshed_bikeshed_bikeshed(), surely?
Consistent naming for related functions... not unheard of in API
design. The above set definitely does not qualify.
>> Also does it make sense to make the cloning behavior of open_tree()
>> optional? Without cloning it's just a plain open(O_PATH). That way
>> it could be renamed mount_clone().
>
> Umm... I'm not sure about that one. If nothing else, OPEN_TREE_DETACH
> might be a good idea, in which case cloning is not the primary effect;
> hell knows.
So conceptually we have the following distinct mount tree operations:
treefd = clone(path);
treefd = detach(path);
attach(treefd, path);
move(path1, path2);
The detach/move/attach trio are more related in functionality, while
clone and detach have the same signature. I'm not sure either.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])
2018-06-04 17:35 ` Al Viro
@ 2018-06-04 19:38 ` Miklos Szeredi
0 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2018-06-04 19:38 UTC (permalink / raw)
To: Al Viro
Cc: Matthew Wilcox, David Howells, Christoph Hellwig, linux-fsdevel,
linux-afs, LKML, Linux API
On Mon, Jun 4, 2018 at 7:35 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Mon, Jun 04, 2018 at 10:16:30AM -0700, Matthew Wilcox wrote:
>> On Sun, Jun 03, 2018 at 01:55:37AM +0100, Al Viro wrote:
>> > +SYSCALL_DEFINE3(open_tree, int, dfd, const char *, filename, unsigned, flags)
>> > +{
>> > + struct file *file;
>> > + struct path path;
>> > + int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW;
>> > + bool detached = flags & OPEN_TREE_CLONE;
>> > + int error;
>> > + int fd;
>> > +
>> > + BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
>>
>> Why do we need OPEN_TREE_CLOEXEC? Wouldn't we be better off just making
>> the fd returned by open_tree implicitly close-on-exec? I can think of
>> no good reason for these file descriptors to be inherited across exec()
>
> How are they different from any file descriptor? It's not as if it was
> something usable only for mounting stuff - again, you can use them
> with any ...at() syscalls.
Defaulting to close on exec helps keep out clutter from the API.
Is there a disadvantage to needing an explicit fcntl(F_SETFD) call to
disable close on exec?
Thanks,
Miklos
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2018-06-04 19:38 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <152720672288.9073.9868393448836301272.stgit@warthog.procyon.org.uk>
[not found] ` <152720691829.9073.10564431140980997005.stgit@warthog.procyon.org.uk>
2018-06-01 6:26 ` [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8] Christoph Hellwig
2018-06-01 6:39 ` Al Viro
2018-06-01 8:27 ` David Howells
2018-06-02 3:09 ` Al Viro
2018-06-02 3:42 ` Al Viro
2018-06-02 4:04 ` Al Viro
2018-06-02 15:45 ` David Howells
2018-06-02 17:49 ` Al Viro
2018-06-03 0:55 ` [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]) Al Viro
2018-06-04 10:34 ` Miklos Szeredi
2018-06-04 15:52 ` Al Viro
2018-06-04 15:59 ` Al Viro
2018-06-04 19:27 ` Miklos Szeredi
2018-06-04 15:27 ` David Howells
2018-06-04 17:16 ` Matthew Wilcox
2018-06-04 17:35 ` Al Viro
2018-06-04 19:38 ` Miklos Szeredi
2018-06-01 8:02 ` [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8] Amir Goldstein
2018-06-01 8:42 ` David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).