Re: User-visible context-mount API

util-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: User-visible context-mount API
       [not found]       ` <20180117041727.GS13338@ZenIV.linux.org.uk>
@ 2018-01-17  9:53         ` Miklos Szeredi
  2018-01-17 11:06           ` Karel Zak
  2018-01-19  6:32           ` Al Viro
  0 siblings, 2 replies; 5+ messages in thread
From: Miklos Szeredi @ 2018-01-17  9:53 UTC (permalink / raw)
  To: Al Viro
  Cc: David Howells, Jeff Layton, Eric W. Biederman, linux-fsdevel,
	Linux API, util-linux, Michael Kerrisk (man-pages)

[Adding util-linux@vger and Michael Kerrisk]

On Wed, Jan 17, 2018 at 5:17 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Jan 16, 2018 at 05:41:46PM +0100, Miklos Szeredi wrote:
>
>> Right.
>>
>> Still, those two (propagation and flags) are properties of the mount.
>> No fundamental difference in how to handle them, that I see.  Okay, we
>> have MS_REC handling in the propagation and not in the flags, but
>> that's something that might make sense for flags as well.
>>
>> What's more interesting is how MS_PRIVATE + MS_REC semantics are
>> complete failure in the real world: the logical thing would be to mark
>> a mount private on the supplied mount AND propagate an umount event to
>> everywhere else.
>
> This is utter nonsense.  Most of the time it's "Fedora, in its infinite
> bogo^Wwisdom has made everything shared; I don't fucking need that
> idiocy, so please unshare this, this and that".  You really don't want
> (or have permissions for) unmounting e.g. /mnt in namespace of init
> when you do that.
>
> Sure, we get tons of bug reports.  Due to idiotic Fedora setup, with
> everything shared.  The same setup that would go up in flames on the
> semantics change you propose.

I wouldn't propose to change existing --make-private, as this would
not be backward compatible. The new semantics would mean a new op,
obviously.

Documenting  --make-private thing properly would also help.  To me the
wording "make private" strongly implies "I want to make submounts
private to this instance".  See for example rhbz#1432211.

> If anything, "private bind on itself" would be a useful operation.
> Turning given location into a mountpoint, and having everything
> under it looking as it used to, but with no propagation at all.
> Without bothering anybody else, even if location currently happens
> to be on a shared/master mount.
>
> I can slap that together for mount(2), but I'm not sure what a sane
> combination of flags for that would look like ;-)  For fsmount
> I think it would be very useful thing to have.

Yes, I think such an operation would be pretty useful.   Not sure if
it's the whole story, though.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: User-visible context-mount API
  2018-01-17  9:53         ` User-visible context-mount API Miklos Szeredi
@ 2018-01-17 11:06           ` Karel Zak
  2018-01-18  9:48             ` Miklos Szeredi
  2018-01-19  2:27             ` Al Viro
  2018-01-19  6:32           ` Al Viro
  1 sibling, 2 replies; 5+ messages in thread
From: Karel Zak @ 2018-01-17 11:06 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Al Viro, David Howells, Jeff Layton, Eric W. Biederman,
	linux-fsdevel, Linux API, util-linux, Michael Kerrisk (man-pages)

On Wed, Jan 17, 2018 at 10:53:36AM +0100, Miklos Szeredi wrote:
> [Adding util-linux@vger and Michael Kerrisk]
> 
> On Wed, Jan 17, 2018 at 5:17 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Tue, Jan 16, 2018 at 05:41:46PM +0100, Miklos Szeredi wrote:
> >
> >> Right.
> >>
> >> Still, those two (propagation and flags) are properties of the mount.
> >> No fundamental difference in how to handle them, that I see.  Okay, we
> >> have MS_REC handling in the propagation and not in the flags, but
> >> that's something that might make sense for flags as well.
> >>
> >> What's more interesting is how MS_PRIVATE + MS_REC semantics are
> >> complete failure in the real world: the logical thing would be to mark
> >> a mount private on the supplied mount AND propagate an umount event to
> >> everywhere else.
> >
> > This is utter nonsense.  Most of the time it's "Fedora, in its infinite
> > bogo^Wwisdom has made everything shared; I don't fucking need that
> > idiocy, so please unshare this, this and that".  You really don't want
> > (or have permissions for) unmounting e.g. /mnt in namespace of init
> > when you do that.
> >
> > Sure, we get tons of bug reports.  Due to idiotic Fedora setup, with
> > everything shared.  The same setup that would go up in flames on the
> > semantics change you propose.

I guess "all shared" is systemd requirement, so I guess it's not
Fedora specific, right?

> I wouldn't propose to change existing --make-private, as this would
> not be backward compatible. The new semantics would mean a new op,
> obviously.

Definitely.

> Documenting  --make-private thing properly would also help.  To me the
> wording "make private" strongly implies "I want to make submounts
> private to this instance".  See for example rhbz#1432211.

All propagation stuff is poorly documented in mount.8. It would be
nice to add section about it to the man page. Volunteer? (My skills to
explain this topic to end-users is pretty limited...)
 
> > If anything, "private bind on itself" would be a useful operation.
> > Turning given location into a mountpoint, and having everything
> > under it looking as it used to, but with no propagation at all.
> > Without bothering anybody else, even if location currently happens
> > to be on a shared/master mount.

Good idea.

> > I can slap that together for mount(2), but I'm not sure what a sane
> > combination of flags for that would look like ;-)

What about new flag (for the API) rather than try to be smart with the
current flags? But I have doubts that invest time to new mount(2)
features is a good idea.

> For fsmount I think it would be very useful thing to have.

Yes.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: User-visible context-mount API
  2018-01-17 11:06           ` Karel Zak
@ 2018-01-18  9:48             ` Miklos Szeredi
  2018-01-19  2:27             ` Al Viro
  1 sibling, 0 replies; 5+ messages in thread
From: Miklos Szeredi @ 2018-01-18  9:48 UTC (permalink / raw)
  To: Karel Zak
  Cc: Al Viro, David Howells, Jeff Layton, Eric W. Biederman,
	linux-fsdevel, Linux API, util-linux, Michael Kerrisk (man-pages)

On Wed, Jan 17, 2018 at 12:06 PM, Karel Zak <kzak@redhat.com> wrote:

>> Documenting  --make-private thing properly would also help.  To me the
>> wording "make private" strongly implies "I want to make submounts
>> private to this instance".  See for example rhbz#1432211.
>
> All propagation stuff is poorly documented in mount.8. It would be
> nice to add section about it to the man page. Volunteer? (My skills to
> explain this topic to end-users is pretty limited...)

Propagation is common to mount(2) and there's some info in there
already, but fine points like this are not explained.

Maybe a new page in section 7?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: User-visible context-mount API
  2018-01-17 11:06           ` Karel Zak
  2018-01-18  9:48             ` Miklos Szeredi
@ 2018-01-19  2:27             ` Al Viro
  1 sibling, 0 replies; 5+ messages in thread
From: Al Viro @ 2018-01-19  2:27 UTC (permalink / raw)
  To: Karel Zak
  Cc: Miklos Szeredi, David Howells, Jeff Layton, Eric W. Biederman,
	linux-fsdevel, Linux API, util-linux, Michael Kerrisk (man-pages)

On Wed, Jan 17, 2018 at 12:06:33PM +0100, Karel Zak wrote:

> What about new flag (for the API) rather than try to be smart with the
> current flags? But I have doubts that invest time to new mount(2)
> features is a good idea.

Would be nice, if we had any spare bits left...  We could, in principle,
turn
#define MS_BIND         4096
#define MS_MOVE         8192
into
#define MS_BIND         0x1000
#define MS_MOVE         0x2000
#define MS_SOMETHING    0x3000
seeing that they should never be used together, but... mount(2)
doesn't reject MS_BIND|MS_MOVE and treats it as MS_BIND instead.
_Probably_ nothing would care, but it risks breaking userland.

We could use one of the internal-only bits for that instead, but
they are also quietly ignored and not rejected, so that would
have the same problem.

mount(2) ABI sucks, film at 11...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: User-visible context-mount API
  2018-01-17  9:53         ` User-visible context-mount API Miklos Szeredi
  2018-01-17 11:06           ` Karel Zak
@ 2018-01-19  6:32           ` Al Viro
  1 sibling, 0 replies; 5+ messages in thread
From: Al Viro @ 2018-01-19  6:32 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: David Howells, Jeff Layton, Eric W. Biederman, linux-fsdevel,
	Linux API, util-linux, Michael Kerrisk (man-pages)

On Wed, Jan 17, 2018 at 10:53:36AM +0100, Miklos Szeredi wrote:

> Documenting  --make-private thing properly would also help.  To me the
> wording "make private" strongly implies "I want to make submounts
> private to this instance".  See for example rhbz#1432211.
> 
> > If anything, "private bind on itself" would be a useful operation.
> > Turning given location into a mountpoint, and having everything
> > under it looking as it used to, but with no propagation at all.
> > Without bothering anybody else, even if location currently happens
> > to be on a shared/master mount.
> >
> > I can slap that together for mount(2), but I'm not sure what a sane
> > combination of flags for that would look like ;-)  For fsmount
> > I think it would be very useful thing to have.
> 
> Yes, I think such an operation would be pretty useful.   Not sure if
> it's the whole story, though.

FWIW, there's a fun variant of the API:

* fsopen(): string -> fsfd; takes fs type name, returns a file descriptor
connected to fs driver.  Subsequent read/write on it is used to pass
options, authenticate, etc. - all you need to talk the driver into
creating an active instance.

* fspick(): location -> fsfd; fsfd connected to filesystem mounted at given
place.  After that you can talk to the driver to get superblock-level
remount.

* new_mount(): fsfd x string -> fd.  Creates a vfsmount and gives a file
descriptor for given relative pathname.

* clone_mount(): location x bool -> fd.  Copies a vfsmount or an entire
subtree (depending upon the second parameter) and returns a file descriptor.
Basically, bind or rbind sans attaching it anywhere.

* change_flags(): fd x (propagation or vfsmount flags) x bool -> int
fd should point to root of some vfsmount (O_PATH, or either of the previous
two.  Flag is "do we want it to affect the entire subtree"; the tricky
question is what to do with vfsmount flags - for those we might want
things like "here's the full set" or "change those flags thus".
Hell knows - there might be two primitives there; the second one
would be fd x mask x new_flags x bool -> int, as in "set the bits
present in mask to values as in new_flags".  Not sure.

* move_mount(): fd x location x bool -> int.  fd - what to move, location -
where to put it, bool - do we want to suppress propagation.  Potentially
hacky part is that if fd is not attached to anything, we simply attach it;
otherwise - move.

Normal mount: fsopen, talk to driver, new_mount, move_mount, close descriptors
mount --bind: fd = clone_mount(old, false); move_mount(fd, new, false); close
mount --rbind: clone_mount(old, true); move_mount; close
mount --make-shared et.al.: open(..., O_PATH); change_flags; close
mount --move: open; mount_move; close
vfsmount-level remount: open; change_flags (or change_mount_flags, if we keep
it separate from topology ones); close
sb-level remount: fspick; talk to driver; close
make an arbitrary subtree really private (as discussed upthread):
	fd = clone_mount(old, true); change_flags (or change_propagation_flags);
	mount_move(fd, old, true); close(fd);

The tricky part in terms of implementation is that we want a
tree created by clone_mount() and never attached anywhere to be
dissolved on the final close() of the result of clone_mount().
It's not quite O_PATH - we want file_operations for that sucker
that would have ->release() doing that.

It would do namespace_lock(), check ->f_path.mnt for a flag and do
umount_tree() if not set, then namespace_unlock().  move_mount()
would set the flag.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-01-19  6:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <28167.1516032442@warthog.procyon.org.uk>
     [not found] ` <CAOssrKdgudK7kKbhQBAnV9EwzHBq=4+9M26JGfmhNDGrGXmnFg@mail.gmail.com>
     [not found]   ` <1643.1516117204@warthog.procyon.org.uk>
     [not found]     ` <CAOssrKdn-ZhOB9V28uL-JK9zgNGJzF4cFBeyoqLLj4pADqNFVQ@mail.gmail.com>
     [not found]       ` <20180117041727.GS13338@ZenIV.linux.org.uk>
2018-01-17  9:53         ` User-visible context-mount API Miklos Szeredi
2018-01-17 11:06           ` Karel Zak
2018-01-18  9:48             ` Miklos Szeredi
2018-01-19  2:27             ` Al Viro
2018-01-19  6:32           ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).