* Regression for MS_MOVE on kernel v5.1 @ 2019-06-12 22:54 Christian Brauner 2019-06-13 4:00 ` Linus Torvalds 2019-06-13 9:27 ` David Howells 0 siblings, 2 replies; 9+ messages in thread From: Christian Brauner @ 2019-06-12 22:54 UTC (permalink / raw) To: viro, linux-kernel, torvalds, linux-fsdevel, linux-api, dhowells Hey, Sorry to be the bearer of bad news but I think I observed a pretty gnarly regression for userspace with MS_MOVE from kernel v5.1 onwards. When propagating mounts across mount namespaces owned by different user namespaces it is not possible anymore to move the mount in the less privileged mount namespace. Here is a reproducer: sudo mount -t tmpfs tmpfs /mnt sudo --make-rshared /mnt # create unprivileged user + mount namespace and preserve propagation unshare -U -m --map-root --propagation=unchanged # now change back to the original mount namespace in another terminal: sudo mkdir /mnt/aaa sudo mount -t tmpfs tmpfs /mnt/aaa # now in the unprivileged user + mount namespace mount --move /mnt/aaa /opt This will work on kernels prior to 5.1 but will fail on kernels starting with 5.1. Unfortunately, this is a pretty big deal for userspace. In LXD - which I maintain when not doing kernel stuff - we use this mechanism to inject mounts into running unprivileged containers. Users started reporting failures against our mount injection feature just a short while ago (cf. [1], [2]) and I just came around to looking into this today. I tracked this down to commit: commit 3bd045cc9c4be2049602b47505256b43908b4e2f Author: Al Viro <viro@zeniv.linux.org.uk> Date: Wed Jan 30 13:15:45 2019 -0500 separate copying and locking mount tree on cross-userns copies Rather than having propagate_mnt() check doing unprivileged copies, lock them before commit_tree(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> reverting it makes MS_MOVE to work correctly again. The commit changes the internal logic to lock mounts when propagating mounts (user+)mount namespaces and - I believe - causes do_mount_move() to fail at: if (old->mnt.mnt_flags & MNT_LOCKED) goto out; If that's indeed the case we should either revert this commit (reverts cleanly, just tested it) or find a fix. Thanks! Christian [1]: https://github.com/lxc/lxd/issues/5788 [2]: https://github.com/lxc/lxd/issues/5836 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Regression for MS_MOVE on kernel v5.1 2019-06-12 22:54 Regression for MS_MOVE on kernel v5.1 Christian Brauner @ 2019-06-13 4:00 ` Linus Torvalds 2019-06-13 13:22 ` Christian Brauner 2019-06-13 9:27 ` David Howells 1 sibling, 1 reply; 9+ messages in thread From: Linus Torvalds @ 2019-06-13 4:00 UTC (permalink / raw) To: Christian Brauner Cc: Al Viro, Linux List Kernel Mailing, linux-fsdevel, Linux API, David Howells On Wed, Jun 12, 2019 at 12:54 PM Christian Brauner <christian@brauner.io> wrote: > > The commit changes the internal logic to lock mounts when propagating > mounts (user+)mount namespaces and - I believe - causes do_mount_move() > to fail at: You mean 'do_move_mount()'. > if (old->mnt.mnt_flags & MNT_LOCKED) > goto out; > > If that's indeed the case we should either revert this commit (reverts > cleanly, just tested it) or find a fix. Hmm.. I'm not entirely sure of the logic here, and just looking at that commit 3bd045cc9c4b ("separate copying and locking mount tree on cross-userns copies") doesn't make me go "Ahh" either. Al? My gut feel is that we need to just revert, since this was in 5.1 and it's getting reasonably late in 5.2 too. But maybe you go "guys, don't be silly, this is easily fixed with this one-liner". Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Regression for MS_MOVE on kernel v5.1 2019-06-13 4:00 ` Linus Torvalds @ 2019-06-13 13:22 ` Christian Brauner 2019-06-13 18:34 ` Eric W. Biederman 0 siblings, 1 reply; 9+ messages in thread From: Christian Brauner @ 2019-06-13 13:22 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Linux List Kernel Mailing, linux-fsdevel, Linux API, David Howells, Eric W. Biederman On Wed, Jun 12, 2019 at 06:00:39PM -1000, Linus Torvalds wrote: > On Wed, Jun 12, 2019 at 12:54 PM Christian Brauner <christian@brauner.io> wrote: > > > > The commit changes the internal logic to lock mounts when propagating > > mounts (user+)mount namespaces and - I believe - causes do_mount_move() > > to fail at: > > You mean 'do_move_mount()'. > > > if (old->mnt.mnt_flags & MNT_LOCKED) > > goto out; > > > > If that's indeed the case we should either revert this commit (reverts > > cleanly, just tested it) or find a fix. > > Hmm.. I'm not entirely sure of the logic here, and just looking at > that commit 3bd045cc9c4b ("separate copying and locking mount tree on > cross-userns copies") doesn't make me go "Ahh" either. > > Al? My gut feel is that we need to just revert, since this was in 5.1 > and it's getting reasonably late in 5.2 too. But maybe you go "guys, > don't be silly, this is easily fixed with this one-liner". David and I have been staring at that code today for a while together. I think I made some sense of it. One thing we weren't absolutely sure is if the old MS_MOVE behavior was intentional or a bug. If it is a bug we have a problem since we quite heavily rely on this... So this whole cross-user+mnt namespace propagation mechanism comes with a big hammer that Eric indeed did introduce a while back which is MNT_LOCKED (cf. [1] for the relevant commit). Afaict, MNT_LOCKED is (among other cases) supposed to prevent a user+mnt namespace pair to get access to a mount that is hidden underneath an additional mount. Consider the following scenario: sudo mount -t tmpfs tmpfs /mnt sudo mount --make-rshared /mnt sudo mount -t tmpfs tmpfs /mnt sudo mount --make-rshared /mnt unshare -U -m --map-root --propagation=unchanged umount /mnt # or mount --move -mnt /opt The last umount/MS_MOVE is supposed to fail since the mount is locked with MNT_LOCKED since umounting or MS_MOVing the mount would reveal the underlying mount which I didn't have access to prior to the creation of my user+mnt namespace pair. (Whether or not this is a reasonable security mechanism is a separate discussion.) But now consider the case where from the ancestor user+mnt namespace pair I do: # propagate the mount to the user+mount namespace pair sudo mount -t tmpfs tmpfs /mnt # switch to the child user+mnt namespace pair umount /mnt # or mount --move /mnt /opt That umount/MS_MOVE should work since that mount was propagated to the unprivileged task after the user+mnt namespace pair was created. Also, because I already had access to the underlying mount in the first place and second because this is literally the only way - we know of - to inject a mount cross mount namespaces and this is a must have feature that quite a lot of users rely on. Christian [1]: git show 5ff9d8a65ce80efb509ce4e8051394e9ed2cd942 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Regression for MS_MOVE on kernel v5.1 2019-06-13 13:22 ` Christian Brauner @ 2019-06-13 18:34 ` Eric W. Biederman 2019-06-13 20:25 ` Miklos Szeredi 0 siblings, 1 reply; 9+ messages in thread From: Eric W. Biederman @ 2019-06-13 18:34 UTC (permalink / raw) To: Christian Brauner Cc: Linus Torvalds, Al Viro, Linux List Kernel Mailing, linux-fsdevel, Linux API, David Howells Christian Brauner <christian@brauner.io> writes: > On Wed, Jun 12, 2019 at 06:00:39PM -1000, Linus Torvalds wrote: >> On Wed, Jun 12, 2019 at 12:54 PM Christian Brauner <christian@brauner.io> wrote: >> > >> > The commit changes the internal logic to lock mounts when propagating >> > mounts (user+)mount namespaces and - I believe - causes do_mount_move() >> > to fail at: >> >> You mean 'do_move_mount()'. >> >> > if (old->mnt.mnt_flags & MNT_LOCKED) >> > goto out; >> > >> > If that's indeed the case we should either revert this commit (reverts >> > cleanly, just tested it) or find a fix. >> >> Hmm.. I'm not entirely sure of the logic here, and just looking at >> that commit 3bd045cc9c4b ("separate copying and locking mount tree on >> cross-userns copies") doesn't make me go "Ahh" either. >> >> Al? My gut feel is that we need to just revert, since this was in 5.1 >> and it's getting reasonably late in 5.2 too. But maybe you go "guys, >> don't be silly, this is easily fixed with this one-liner". > > David and I have been staring at that code today for a while together. > I think I made some sense of it. > One thing we weren't absolutely sure is if the old MS_MOVE behavior was > intentional or a bug. If it is a bug we have a problem since we quite > heavily rely on this... It was intentional. The only mounts that are locked in propagation are the mounts that propagate together. If you see the mounts come in as individuals you can always see/manipulate/work with the underlying mount. I can think of only a few ways for MNT_LOCKED to become set: a) unshare(CLONE_NEWNS) b) mount --rclone /path/to/mnt/tree /path/to/propagation/point c) mount --move /path/to/mnt/tree /path/to/propgation/point Nothing in the target namespace should be locked on the propgation point but all of the new mounts that came across as a unit should be locked together. > So this whole cross-user+mnt namespace propagation mechanism comes with > a big hammer that Eric indeed did introduce a while back which is > MNT_LOCKED (cf. [1] for the relevant commit). > > Afaict, MNT_LOCKED is (among other cases) supposed to prevent a user+mnt > namespace pair to get access to a mount that is hidden underneath an > additional mount. Consider the following scenario: > > sudo mount -t tmpfs tmpfs /mnt > sudo mount --make-rshared /mnt > sudo mount -t tmpfs tmpfs /mnt > sudo mount --make-rshared /mnt > unshare -U -m --map-root --propagation=unchanged > > umount /mnt > # or > mount --move -mnt /opt > > The last umount/MS_MOVE is supposed to fail since the mount is locked > with MNT_LOCKED since umounting or MS_MOVing the mount would reveal the > underlying mount which I didn't have access to prior to the creation of > my user+mnt namespace pair. > (Whether or not this is a reasonable security mechanism is a separate > discussion.) > > But now consider the case where from the ancestor user+mnt namespace > pair I do: > > # propagate the mount to the user+mount namespace pair > sudo mount -t tmpfs tmpfs /mnt > # switch to the child user+mnt namespace pair > umount /mnt > # or > mount --move /mnt /opt > > That umount/MS_MOVE should work since that mount was propagated to the > unprivileged task after the user+mnt namespace pair was created. > Also, because I already had access to the underlying mount in the first > place and second because this is literally the only way - we know of - > to inject a mount cross mount namespaces and this is a must have feature > that quite a lot of users rely on. Then it breaking is definitely a regression that needs to be fixed. I believe the problematic change as made because the new mount api allows attaching floating mounts. Or that was the plan last I looked. Those floating mounts don't have a mnt_ns so will result in a NULL pointer dereference when they are attached. So I suspect fixing this is not as simple as reverting a single patch. Eric ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Regression for MS_MOVE on kernel v5.1 2019-06-13 18:34 ` Eric W. Biederman @ 2019-06-13 20:25 ` Miklos Szeredi 2019-06-13 21:59 ` Eric W. Biederman 0 siblings, 1 reply; 9+ messages in thread From: Miklos Szeredi @ 2019-06-13 20:25 UTC (permalink / raw) To: Eric W. Biederman Cc: Christian Brauner, Linus Torvalds, Al Viro, Linux List Kernel Mailing, linux-fsdevel, Linux API, David Howells On Thu, Jun 13, 2019 at 8:35 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Christian Brauner <christian@brauner.io> writes: > > > On Wed, Jun 12, 2019 at 06:00:39PM -1000, Linus Torvalds wrote: > >> On Wed, Jun 12, 2019 at 12:54 PM Christian Brauner <christian@brauner.io> wrote: > >> > > >> > The commit changes the internal logic to lock mounts when propagating > >> > mounts (user+)mount namespaces and - I believe - causes do_mount_move() > >> > to fail at: > >> > >> You mean 'do_move_mount()'. > >> > >> > if (old->mnt.mnt_flags & MNT_LOCKED) > >> > goto out; > >> > > >> > If that's indeed the case we should either revert this commit (reverts > >> > cleanly, just tested it) or find a fix. > >> > >> Hmm.. I'm not entirely sure of the logic here, and just looking at > >> that commit 3bd045cc9c4b ("separate copying and locking mount tree on > >> cross-userns copies") doesn't make me go "Ahh" either. > >> > >> Al? My gut feel is that we need to just revert, since this was in 5.1 > >> and it's getting reasonably late in 5.2 too. But maybe you go "guys, > >> don't be silly, this is easily fixed with this one-liner". > > > > David and I have been staring at that code today for a while together. > > I think I made some sense of it. > > One thing we weren't absolutely sure is if the old MS_MOVE behavior was > > intentional or a bug. If it is a bug we have a problem since we quite > > heavily rely on this... > > It was intentional. > > The only mounts that are locked in propagation are the mounts that > propagate together. If you see the mounts come in as individuals you > can always see/manipulate/work with the underlying mount. > > I can think of only a few ways for MNT_LOCKED to become set: > a) unshare(CLONE_NEWNS) > b) mount --rclone /path/to/mnt/tree /path/to/propagation/point > c) mount --move /path/to/mnt/tree /path/to/propgation/point > > Nothing in the target namespace should be locked on the propgation point > but all of the new mounts that came across as a unit should be locked > together. Locked together means the root of the new mount tree doesn't have MNT_LOCKED set, but all mounts below do have MNT_LOCKED, right? Isn't the bug here that the root mount gets MNT_LOCKED as well? > > Then it breaking is definitely a regression that needs to be fixed. > > I believe the problematic change as made because the new mount > api allows attaching floating mounts. Or that was the plan last I > looked. Those floating mounts don't have a mnt_ns so will result > in a NULL pointer dereference when they are attached. Well, it's called anonymous namespace. So there *is* an mnt_ns, and its lifetime is bound to the file returned by fsmount(). Thanks, Miklos ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Regression for MS_MOVE on kernel v5.1 2019-06-13 20:25 ` Miklos Szeredi @ 2019-06-13 21:59 ` Eric W. Biederman 2019-06-13 23:37 ` Christian Brauner 0 siblings, 1 reply; 9+ messages in thread From: Eric W. Biederman @ 2019-06-13 21:59 UTC (permalink / raw) To: Miklos Szeredi Cc: Christian Brauner, Linus Torvalds, Al Viro, Linux List Kernel Mailing, linux-fsdevel, Linux API, David Howells Miklos Szeredi <miklos@szeredi.hu> writes: > On Thu, Jun 13, 2019 at 8:35 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >> >> Christian Brauner <christian@brauner.io> writes: >> >> > On Wed, Jun 12, 2019 at 06:00:39PM -1000, Linus Torvalds wrote: >> >> On Wed, Jun 12, 2019 at 12:54 PM Christian Brauner <christian@brauner.io> wrote: >> >> > >> >> > The commit changes the internal logic to lock mounts when propagating >> >> > mounts (user+)mount namespaces and - I believe - causes do_mount_move() >> >> > to fail at: >> >> >> >> You mean 'do_move_mount()'. >> >> >> >> > if (old->mnt.mnt_flags & MNT_LOCKED) >> >> > goto out; >> >> > >> >> > If that's indeed the case we should either revert this commit (reverts >> >> > cleanly, just tested it) or find a fix. >> >> >> >> Hmm.. I'm not entirely sure of the logic here, and just looking at >> >> that commit 3bd045cc9c4b ("separate copying and locking mount tree on >> >> cross-userns copies") doesn't make me go "Ahh" either. >> >> >> >> Al? My gut feel is that we need to just revert, since this was in 5.1 >> >> and it's getting reasonably late in 5.2 too. But maybe you go "guys, >> >> don't be silly, this is easily fixed with this one-liner". >> > >> > David and I have been staring at that code today for a while together. >> > I think I made some sense of it. >> > One thing we weren't absolutely sure is if the old MS_MOVE behavior was >> > intentional or a bug. If it is a bug we have a problem since we quite >> > heavily rely on this... >> >> It was intentional. >> >> The only mounts that are locked in propagation are the mounts that >> propagate together. If you see the mounts come in as individuals you >> can always see/manipulate/work with the underlying mount. >> >> I can think of only a few ways for MNT_LOCKED to become set: >> a) unshare(CLONE_NEWNS) >> b) mount --rclone /path/to/mnt/tree /path/to/propagation/point >> c) mount --move /path/to/mnt/tree /path/to/propgation/point >> >> Nothing in the target namespace should be locked on the propgation point >> but all of the new mounts that came across as a unit should be locked >> together. > > Locked together means the root of the new mount tree doesn't have > MNT_LOCKED set, but all mounts below do have MNT_LOCKED, right? > > Isn't the bug here that the root mount gets MNT_LOCKED as well? Yes, and the code to remove MNT_LOCKED is still sitting there in propogate_one right after it calls copy_tree. It should be a trivial matter of moving that change to after the lock_mnt_tree call. Now that I have been elightened about anonymous mount namespaces I am suspecting that we want to take the user_namespace of the anonymous namespace into account when deciding to lock the mounts. >> Then it breaking is definitely a regression that needs to be fixed. >> >> I believe the problematic change as made because the new mount >> api allows attaching floating mounts. Or that was the plan last I >> looked. Those floating mounts don't have a mnt_ns so will result >> in a NULL pointer dereference when they are attached. > > Well, it's called anonymous namespace. So there *is* an mnt_ns, and > its lifetime is bound to the file returned by fsmount(). Interesting. That has changed since I last saw the patches. Below is what will probably be a straight forward fix for the regression. Eric diff --git a/fs/namespace.c b/fs/namespace.c index ffb13f0562b0..a39edeecbc46 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2105,6 +2105,7 @@ static int attach_recursive_mnt(struct mount *source_mnt, /* Notice when we are propagating across user namespaces */ if (child->mnt_parent->mnt_ns->user_ns != user_ns) lock_mnt_tree(child); + child->mnt.mnt_flags &= ~MNT_LOCKED; commit_tree(child); } put_mountpoint(smp); diff --git a/fs/pnode.c b/fs/pnode.c index 7ea6cfb65077..012be405fec0 100644 --- a/fs/pnode.c +++ b/fs/pnode.c @@ -262,7 +262,6 @@ static int propagate_one(struct mount *m) child = copy_tree(last_source, last_source->mnt.mnt_root, type); if (IS_ERR(child)) return PTR_ERR(child); - child->mnt.mnt_flags &= ~MNT_LOCKED; mnt_set_mountpoint(m, mp, child); last_dest = m; last_source = child; ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: Regression for MS_MOVE on kernel v5.1 2019-06-13 21:59 ` Eric W. Biederman @ 2019-06-13 23:37 ` Christian Brauner 2019-06-14 12:08 ` Eric W. Biederman 0 siblings, 1 reply; 9+ messages in thread From: Christian Brauner @ 2019-06-13 23:37 UTC (permalink / raw) To: Eric W. Biederman Cc: Miklos Szeredi, Linus Torvalds, Al Viro, Linux List Kernel Mailing, linux-fsdevel, Linux API, David Howells On Thu, Jun 13, 2019 at 04:59:24PM -0500, Eric W. Biederman wrote: > Miklos Szeredi <miklos@szeredi.hu> writes: > > > On Thu, Jun 13, 2019 at 8:35 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > >> > >> Christian Brauner <christian@brauner.io> writes: > >> > >> > On Wed, Jun 12, 2019 at 06:00:39PM -1000, Linus Torvalds wrote: > >> >> On Wed, Jun 12, 2019 at 12:54 PM Christian Brauner <christian@brauner.io> wrote: > >> >> > > >> >> > The commit changes the internal logic to lock mounts when propagating > >> >> > mounts (user+)mount namespaces and - I believe - causes do_mount_move() > >> >> > to fail at: > >> >> > >> >> You mean 'do_move_mount()'. > >> >> > >> >> > if (old->mnt.mnt_flags & MNT_LOCKED) > >> >> > goto out; > >> >> > > >> >> > If that's indeed the case we should either revert this commit (reverts > >> >> > cleanly, just tested it) or find a fix. > >> >> > >> >> Hmm.. I'm not entirely sure of the logic here, and just looking at > >> >> that commit 3bd045cc9c4b ("separate copying and locking mount tree on > >> >> cross-userns copies") doesn't make me go "Ahh" either. > >> >> > >> >> Al? My gut feel is that we need to just revert, since this was in 5.1 > >> >> and it's getting reasonably late in 5.2 too. But maybe you go "guys, > >> >> don't be silly, this is easily fixed with this one-liner". > >> > > >> > David and I have been staring at that code today for a while together. > >> > I think I made some sense of it. > >> > One thing we weren't absolutely sure is if the old MS_MOVE behavior was > >> > intentional or a bug. If it is a bug we have a problem since we quite > >> > heavily rely on this... > >> > >> It was intentional. > >> > >> The only mounts that are locked in propagation are the mounts that > >> propagate together. If you see the mounts come in as individuals you > >> can always see/manipulate/work with the underlying mount. > >> > >> I can think of only a few ways for MNT_LOCKED to become set: > >> a) unshare(CLONE_NEWNS) > >> b) mount --rclone /path/to/mnt/tree /path/to/propagation/point > >> c) mount --move /path/to/mnt/tree /path/to/propgation/point > >> > >> Nothing in the target namespace should be locked on the propgation point > >> but all of the new mounts that came across as a unit should be locked > >> together. > > > > Locked together means the root of the new mount tree doesn't have > > MNT_LOCKED set, but all mounts below do have MNT_LOCKED, right? > > > > Isn't the bug here that the root mount gets MNT_LOCKED as well? Yes, we suspected this as well. We just couldn't pinpoint where the surgery would need to start. > > Yes, and the code to remove MNT_LOCKED is still sitting there in > propogate_one right after it calls copy_tree. It should be a trivial > matter of moving that change to after the lock_mnt_tree call. > > Now that I have been elightened about anonymous mount namespaces > I am suspecting that we want to take the user_namespace of the anonymous > namespace into account when deciding to lock the mounts. > > >> Then it breaking is definitely a regression that needs to be fixed. > >> > >> I believe the problematic change as made because the new mount > >> api allows attaching floating mounts. Or that was the plan last I > >> looked. Those floating mounts don't have a mnt_ns so will result > >> in a NULL pointer dereference when they are attached. > > > > Well, it's called anonymous namespace. So there *is* an mnt_ns, and > > its lifetime is bound to the file returned by fsmount(). > > Interesting. That has changed since I last saw the patches. > > Below is what will probably be a straight forward fix for the regression. Tested the patch just now applied on top of v5.1. It fixes the regression. Can you please send a proper patch, Eric? Tested-by: Christian Brauner <christian@brauner.io> Acked-by: Christian Brauner <christian@brauner.io> > > Eric > > diff --git a/fs/namespace.c b/fs/namespace.c > index ffb13f0562b0..a39edeecbc46 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2105,6 +2105,7 @@ static int attach_recursive_mnt(struct mount *source_mnt, > /* Notice when we are propagating across user namespaces */ > if (child->mnt_parent->mnt_ns->user_ns != user_ns) > lock_mnt_tree(child); > + child->mnt.mnt_flags &= ~MNT_LOCKED; > commit_tree(child); > } > put_mountpoint(smp); > diff --git a/fs/pnode.c b/fs/pnode.c > index 7ea6cfb65077..012be405fec0 100644 > --- a/fs/pnode.c > +++ b/fs/pnode.c > @@ -262,7 +262,6 @@ static int propagate_one(struct mount *m) > child = copy_tree(last_source, last_source->mnt.mnt_root, type); > if (IS_ERR(child)) > return PTR_ERR(child); > - child->mnt.mnt_flags &= ~MNT_LOCKED; > mnt_set_mountpoint(m, mp, child); > last_dest = m; > last_source = child; > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Regression for MS_MOVE on kernel v5.1 2019-06-13 23:37 ` Christian Brauner @ 2019-06-14 12:08 ` Eric W. Biederman 0 siblings, 0 replies; 9+ messages in thread From: Eric W. Biederman @ 2019-06-14 12:08 UTC (permalink / raw) To: Christian Brauner Cc: Miklos Szeredi, Linus Torvalds, Al Viro, Linux List Kernel Mailing, linux-fsdevel, Linux API, David Howells Christian Brauner <christian@brauner.io> writes: > On Thu, Jun 13, 2019 at 04:59:24PM -0500, Eric W. Biederman wrote: >> Miklos Szeredi <miklos@szeredi.hu> writes: >> >> > On Thu, Jun 13, 2019 at 8:35 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >> >> >> >> Christian Brauner <christian@brauner.io> writes: >> >> >> >> > On Wed, Jun 12, 2019 at 06:00:39PM -1000, Linus Torvalds wrote: >> >> >> On Wed, Jun 12, 2019 at 12:54 PM Christian Brauner <christian@brauner.io> wrote: >> >> >> > >> >> >> > The commit changes the internal logic to lock mounts when propagating >> >> >> > mounts (user+)mount namespaces and - I believe - causes do_mount_move() >> >> >> > to fail at: >> >> >> >> >> >> You mean 'do_move_mount()'. >> >> >> >> >> >> > if (old->mnt.mnt_flags & MNT_LOCKED) >> >> >> > goto out; >> >> >> > >> >> >> > If that's indeed the case we should either revert this commit (reverts >> >> >> > cleanly, just tested it) or find a fix. >> >> >> >> >> >> Hmm.. I'm not entirely sure of the logic here, and just looking at >> >> >> that commit 3bd045cc9c4b ("separate copying and locking mount tree on >> >> >> cross-userns copies") doesn't make me go "Ahh" either. >> >> >> >> >> >> Al? My gut feel is that we need to just revert, since this was in 5.1 >> >> >> and it's getting reasonably late in 5.2 too. But maybe you go "guys, >> >> >> don't be silly, this is easily fixed with this one-liner". >> >> > >> >> > David and I have been staring at that code today for a while together. >> >> > I think I made some sense of it. >> >> > One thing we weren't absolutely sure is if the old MS_MOVE behavior was >> >> > intentional or a bug. If it is a bug we have a problem since we quite >> >> > heavily rely on this... >> >> >> >> It was intentional. >> >> >> >> The only mounts that are locked in propagation are the mounts that >> >> propagate together. If you see the mounts come in as individuals you >> >> can always see/manipulate/work with the underlying mount. >> >> >> >> I can think of only a few ways for MNT_LOCKED to become set: >> >> a) unshare(CLONE_NEWNS) >> >> b) mount --rclone /path/to/mnt/tree /path/to/propagation/point >> >> c) mount --move /path/to/mnt/tree /path/to/propgation/point >> >> >> >> Nothing in the target namespace should be locked on the propgation point >> >> but all of the new mounts that came across as a unit should be locked >> >> together. >> > >> > Locked together means the root of the new mount tree doesn't have >> > MNT_LOCKED set, but all mounts below do have MNT_LOCKED, right? >> > >> > Isn't the bug here that the root mount gets MNT_LOCKED as well? > > Yes, we suspected this as well. We just couldn't pinpoint where the > surgery would need to start. > >> >> Yes, and the code to remove MNT_LOCKED is still sitting there in >> propogate_one right after it calls copy_tree. It should be a trivial >> matter of moving that change to after the lock_mnt_tree call. >> >> Now that I have been elightened about anonymous mount namespaces >> I am suspecting that we want to take the user_namespace of the anonymous >> namespace into account when deciding to lock the mounts. >> >> >> Then it breaking is definitely a regression that needs to be fixed. >> >> >> >> I believe the problematic change as made because the new mount >> >> api allows attaching floating mounts. Or that was the plan last I >> >> looked. Those floating mounts don't have a mnt_ns so will result >> >> in a NULL pointer dereference when they are attached. >> > >> > Well, it's called anonymous namespace. So there *is* an mnt_ns, and >> > its lifetime is bound to the file returned by fsmount(). >> >> Interesting. That has changed since I last saw the patches. >> >> Below is what will probably be a straight forward fix for the regression. > > Tested the patch just now applied on top of v5.1. It fixes the > regression. > Can you please send a proper patch, Eric? > > Tested-by: Christian Brauner <christian@brauner.io> > Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> I will let Al or whoever take this over the finish line. I am too sleep deprived at the moment to say anything about the quality of my patch. Eric >> diff --git a/fs/namespace.c b/fs/namespace.c >> index ffb13f0562b0..a39edeecbc46 100644 >> --- a/fs/namespace.c >> +++ b/fs/namespace.c >> @@ -2105,6 +2105,7 @@ static int attach_recursive_mnt(struct mount *source_mnt, >> /* Notice when we are propagating across user namespaces */ >> if (child->mnt_parent->mnt_ns->user_ns != user_ns) >> lock_mnt_tree(child); >> + child->mnt.mnt_flags &= ~MNT_LOCKED; >> commit_tree(child); >> } >> put_mountpoint(smp); >> diff --git a/fs/pnode.c b/fs/pnode.c >> index 7ea6cfb65077..012be405fec0 100644 >> --- a/fs/pnode.c >> +++ b/fs/pnode.c >> @@ -262,7 +262,6 @@ static int propagate_one(struct mount *m) >> child = copy_tree(last_source, last_source->mnt.mnt_root, type); >> if (IS_ERR(child)) >> return PTR_ERR(child); >> - child->mnt.mnt_flags &= ~MNT_LOCKED; >> mnt_set_mountpoint(m, mp, child); >> last_dest = m; >> last_source = child; >> >> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Regression for MS_MOVE on kernel v5.1 2019-06-12 22:54 Regression for MS_MOVE on kernel v5.1 Christian Brauner 2019-06-13 4:00 ` Linus Torvalds @ 2019-06-13 9:27 ` David Howells 1 sibling, 0 replies; 9+ messages in thread From: David Howells @ 2019-06-13 9:27 UTC (permalink / raw) To: Linus Torvalds Cc: dhowells, Eric W. Biederman, Christian Brauner, Al Viro, Linux List Kernel Mailing, linux-fsdevel, Linux API [Adding Eric to the cc list since he implemented MNT_LOCKED] Linus Torvalds <torvalds@linux-foundation.org> wrote: > > The commit changes the internal logic to lock mounts when propagating > > mounts (user+)mount namespaces and - I believe - causes do_mount_move() > > to fail at: > > You mean 'do_move_mount()'. > > > if (old->mnt.mnt_flags & MNT_LOCKED) > > goto out; > > > > If that's indeed the case we should either revert this commit (reverts > > cleanly, just tested it) or find a fix. > > Hmm.. I'm not entirely sure of the logic here, and just looking at > that commit 3bd045cc9c4b ("separate copying and locking mount tree on > cross-userns copies") doesn't make me go "Ahh" either. > > Al? My gut feel is that we need to just revert, since this was in 5.1 > and it's getting reasonably late in 5.2 too. But maybe you go "guys, > don't be silly, this is easily fixed with this one-liner". ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-06-14 12:08 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-06-12 22:54 Regression for MS_MOVE on kernel v5.1 Christian Brauner 2019-06-13 4:00 ` Linus Torvalds 2019-06-13 13:22 ` Christian Brauner 2019-06-13 18:34 ` Eric W. Biederman 2019-06-13 20:25 ` Miklos Szeredi 2019-06-13 21:59 ` Eric W. Biederman 2019-06-13 23:37 ` Christian Brauner 2019-06-14 12:08 ` Eric W. Biederman 2019-06-13 9:27 ` David Howells
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).