* [PATCH] fs/namespace.c: fix mountpath handling in do_lock_mount() @ 2025-08-18 17:22 Ryan Chung 2025-08-18 20:14 ` [RFC] {do_,}lock_mount() behaviour wrt races and move_mount(2) with empty to_path (was Re: [PATCH] fs/namespace.c: fix mountpath handling in do_lock_mount()) Al Viro 0 siblings, 1 reply; 4+ messages in thread From: Ryan Chung @ 2025-08-18 17:22 UTC (permalink / raw) To: viro, brauner, jack Cc: linux-fsdevel, linux-kernel, linux-kernel-mentees, Ryan Chung, kernel test robot Updates documentation for do_lock_mount() in fs/namespace.c to clarify its parameters and return description to fix warning reported by syzbot. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506301911.uysRaP8b-lkp@intel.com/ Signed-off-by: Ryan Chung <seokwoo.chung130@gmail.com> --- fs/namespace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index ddfd4457d338..577fdff9f1a8 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2741,6 +2741,7 @@ static int attach_recursive_mnt(struct mount *source_mnt, /** * do_lock_mount - lock mount and mountpoint * @path: target path + * @pinned: on success, holds a pin guarding the mountpoint * @beneath: whether the intention is to mount beneath @path * * Follow the mount stack on @path until the top mount @mnt is found. If @@ -2769,8 +2770,7 @@ static int attach_recursive_mnt(struct mount *source_mnt, * to @mnt->mnt_mp->m_dentry. But if @mnt has been unmounted it will * point to @mnt->mnt_root and @mnt->mnt_mp will be NULL. * - * Return: Either the target mountpoint on the top mount or the top - * mount's mountpoint. + * Return: On success, 0 is returned. On failure, err is returned. */ static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bool beneath) { -- 2.43.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC] {do_,}lock_mount() behaviour wrt races and move_mount(2) with empty to_path (was Re: [PATCH] fs/namespace.c: fix mountpath handling in do_lock_mount()) 2025-08-18 17:22 [PATCH] fs/namespace.c: fix mountpath handling in do_lock_mount() Ryan Chung @ 2025-08-18 20:14 ` Al Viro 2025-08-18 20:56 ` Al Viro 0 siblings, 1 reply; 4+ messages in thread From: Al Viro @ 2025-08-18 20:14 UTC (permalink / raw) To: Ryan Chung Cc: brauner, jack, linux-fsdevel, linux-kernel, linux-kernel-mentees, kernel test robot, Linus Torvalds On Tue, Aug 19, 2025 at 02:22:35AM +0900, Ryan Chung wrote: > Updates documentation for do_lock_mount() in fs/namespace.c > to clarify its parameters and return description to fix > warning reported by syzbot. > > Reported-by: kernel test robot <lkp@intel.com> > Closes: https://lore.kernel.org/oe-kbuild-all/202506301911.uysRaP8b-lkp@intel.com/ > Signed-off-by: Ryan Chung <seokwoo.chung130@gmail.com> > --- > fs/namespace.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index ddfd4457d338..577fdff9f1a8 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2741,6 +2741,7 @@ static int attach_recursive_mnt(struct mount *source_mnt, > /** > * do_lock_mount - lock mount and mountpoint > * @path: target path > + * @pinned: on success, holds a pin guarding the mountpoint I'm not sure if 'pin' is suitable here and in any case, that's not the only problem in that description - take a look at "Return:" part in there. The underlying problem is the semantics of function itself. lock_mount() assumed that it was called on the result of pathname resolution; the question is what to do if we race with somebody mounting something on top of the same location while we had been grabbing namespace_sem? "Follow through to the root of whatever's been mounted on top, same as we'd done if pathname resolution happened slightly later" used to be a reasonable answer, but these days we have move_mount(2), where we have * MOVE_MOUNT_T_EMPTY_PATH combined with empty pathname, which will have us start with whatever the descriptor is pointing to, mounts or no mounts. Choosing to treat that as "follow mounts anyway" is not a big deal. * MOVE_MOUNT_BENEATH - treated as "follow mounts and slip the damn thing under the topmost one". Again, OK for non-empty pathname, but... for empty ones the rationale is weaker. Alternative would be to treat these races as "act as if we'd won and the other guy had overmounted ours", i.e. *NOT* follow mounts. Again, for old syscalls that's fine - if another thread has raced with us and mounted something on top of the place we want to mount on, it could just as easily have come *after* we'd completed mount(2) and mounted their stuff on top of ours. If userland is not fine with such outcome, it needs to provide serialization between the callers. For move_mount(2)... again, the only real question is empty to_path case. Comments? Note, BTW, that attach_recursive_mnt() used to require dest_mnt/dest_mp to be on the very top; since 6.16 it treats that as "slip it under whatever's on top of that" - that's exactly what happens in 'beneath' case. So the second alternative is easily doable these days. And it would really simplify the lock_mount()/do_lock_mount()... ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] {do_,}lock_mount() behaviour wrt races and move_mount(2) with empty to_path (was Re: [PATCH] fs/namespace.c: fix mountpath handling in do_lock_mount()) 2025-08-18 20:14 ` [RFC] {do_,}lock_mount() behaviour wrt races and move_mount(2) with empty to_path (was Re: [PATCH] fs/namespace.c: fix mountpath handling in do_lock_mount()) Al Viro @ 2025-08-18 20:56 ` Al Viro 2025-08-19 9:40 ` Christian Brauner 0 siblings, 1 reply; 4+ messages in thread From: Al Viro @ 2025-08-18 20:56 UTC (permalink / raw) To: linux-fsdevel Cc: brauner, jack, linux-kernel, linux-kernel-mentees, Linus Torvalds, Ryan Chung On Mon, Aug 18, 2025 at 09:14:28PM +0100, Al Viro wrote: > Alternative would be to treat these races as "act as if we'd won and > the other guy had overmounted ours", i.e. *NOT* follow mounts. Again, > for old syscalls that's fine - if another thread has raced with us and > mounted something on top of the place we want to mount on, it could just > as easily have come *after* we'd completed mount(2) and mounted their > stuff on top of ours. If userland is not fine with such outcome, it needs > to provide serialization between the callers. For move_mount(2)... again, > the only real question is empty to_path case. > > Comments? Thinking about it a bit more... Unfortunately, there's another corner case: "." as mountpoint. That would affect that old syscalls as well and I'm not sure that there's no userland code that relies upon the current behaviour. Background: pathname resolution does *NOT* follow mounts on the starting point and it does not follow mounts after "." ; mkdir /tmp/foo ; mount -t tmpfs none /tmp/foo ; cd /tmp/foo ; echo under > a ; cat /tmp/foo/a under ; mount -t tmpfs none /tmp/foo ; cat a under ; cat /tmp/foo/a cat: /tmp/foo/a: no such file or directory ; echo under > b ; cat b under ; cat /tmp/foo/b cat: /tmp/foo/b: no such file or directory ; It's been a bad decision (if it can be called that - it's been more of an accident, AFAICT), but it's decades too late to change it. And interaction with mount is also fun: mount(2) *DOES* follow mounts on the end of any pathname, no matter what. So in case when we are standing in an overmounted directory, ls . will show the contents of that directory, but mount <something> . will mount on top of whatever's mounted there. So the alternative I've mentioned above would change the behaviour of old syscalls in a corner case that just might be actually used in userland code - including the scripts run at the boot time, of all things ;-/ IOW, it probably falls under "can't touch that, no matter how much we'd like to" ;-/ Pity, that... That leaves the question of MOVE_MOUNT_BENEATH with empty pathname - do we want a variant that would say "slide precisely under the opened directory I gave you, no matter what might overmount it"? At the very least this corner case needs to be documented in move_mount(2) - behaviour of move_mount(_, _, dir_fd, "", MOVE_MOUNT_T_EMPTY | MOVE_MOUNT_BENEATH) has two apriori reasonable variants ("slide right under the top of whatever pile there might be over dir_fd" and "slide right under dir_fd itself, no matter what pile might be on top of that") and leaving it unspecified is not good, IMO... ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] {do_,}lock_mount() behaviour wrt races and move_mount(2) with empty to_path (was Re: [PATCH] fs/namespace.c: fix mountpath handling in do_lock_mount()) 2025-08-18 20:56 ` Al Viro @ 2025-08-19 9:40 ` Christian Brauner 0 siblings, 0 replies; 4+ messages in thread From: Christian Brauner @ 2025-08-19 9:40 UTC (permalink / raw) To: Al Viro Cc: linux-fsdevel, jack, linux-kernel, linux-kernel-mentees, Linus Torvalds, Ryan Chung On Mon, Aug 18, 2025 at 09:56:06PM +0100, Al Viro wrote: > On Mon, Aug 18, 2025 at 09:14:28PM +0100, Al Viro wrote: > > > Alternative would be to treat these races as "act as if we'd won and > > the other guy had overmounted ours", i.e. *NOT* follow mounts. Again, > > for old syscalls that's fine - if another thread has raced with us and > > mounted something on top of the place we want to mount on, it could just > > as easily have come *after* we'd completed mount(2) and mounted their > > stuff on top of ours. If userland is not fine with such outcome, it needs > > to provide serialization between the callers. For move_mount(2)... again, > > the only real question is empty to_path case. > > > > Comments? > > Thinking about it a bit more... Unfortunately, there's another corner > case: "." as mountpoint. That would affect that old syscalls as well > and I'm not sure that there's no userland code that relies upon the > current behaviour. > > Background: pathname resolution does *NOT* follow mounts on the starting > point and it does not follow mounts after "." > > ; mkdir /tmp/foo > ; mount -t tmpfs none /tmp/foo > ; cd /tmp/foo > ; echo under > a > ; cat /tmp/foo/a > under > ; mount -t tmpfs none /tmp/foo > ; cat a > under > ; cat /tmp/foo/a > cat: /tmp/foo/a: no such file or directory > ; echo under > b > ; cat b > under > ; cat /tmp/foo/b > cat: /tmp/foo/b: no such file or directory > ; > > It's been a bad decision (if it can be called that - it's been more > of an accident, AFAICT), but it's decades too late to change it. > And interaction with mount is also fun: mount(2) *DOES* follow mounts > on the end of any pathname, no matter what. So in case when we are > standing in an overmounted directory, ls . will show the contents of > that directory, but mount <something> . will mount on top of whatever's > mounted there. > > So the alternative I've mentioned above would change the behaviour of > old syscalls in a corner case that just might be actually used in userland > code - including the scripts run at the boot time, of all things ;-/ > > IOW, it probably falls under "can't touch that, no matter how much we'd > like to" ;-/ Pity, that... > > That leaves the question of MOVE_MOUNT_BENEATH with empty pathname - > do we want a variant that would say "slide precisely under the opened > directory I gave you, no matter what might overmount it"? Afaict, right now MOVE_MOUNT_BENEATH will take the overmount into account even for "." just like mount(2) will lookup the topmost mount no matter what. That is what userspace expects. I don't think we need a variant where "." ignores overmounts for MOVE_MOUNT_BENEATH and really not unless someone has a specific use-case for it. If it comes to that we should probably add a new flag. > > At the very least this corner case needs to be documented in move_mount(2) > - behaviour of > move_mount(_, _, dir_fd, "", > MOVE_MOUNT_T_EMPTY | MOVE_MOUNT_BENEATH) > has two apriori reasonable variants ("slide right under the top of > whatever pile there might be over dir_fd" and "slide right under dir_fd Yes, that's what's intended and documented also what I wrote in my commit messages and what the selftests should test for. I specifically did not make it deviate from standard mount(2) behavior. > itself, no matter what pile might be on top of that") and leaving it > unspecified is not good, IMO... Sure, Aleksa can pull that into his documentation patches. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-08-19 9:40 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-18 17:22 [PATCH] fs/namespace.c: fix mountpath handling in do_lock_mount() Ryan Chung 2025-08-18 20:14 ` [RFC] {do_,}lock_mount() behaviour wrt races and move_mount(2) with empty to_path (was Re: [PATCH] fs/namespace.c: fix mountpath handling in do_lock_mount()) Al Viro 2025-08-18 20:56 ` Al Viro 2025-08-19 9:40 ` Christian Brauner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).