[PATCH] fhandle: use more consistent rules for decoding file handle from userns

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] fhandle: use more consistent rules for decoding file handle from userns
@ 2025-08-27 19:43 Amir Goldstein
  2025-08-29  7:48 ` Christian Brauner
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Amir Goldstein @ 2025-08-27 19:43 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Chuck Lever, Jeff Layton, linux-fsdevel

Commit 620c266f39493 ("fhandle: relax open_by_handle_at() permission
checks") relaxed the coditions for decoding a file handle from non init
userns.

The conditions are that that decoded dentry is accessible from the user
provided mountfd (or to fs root) and that all the ancestors along the
path have a valid id mapping in the userns.

These conditions are intentionally more strict than the condition that
the decoded dentry should be "lookable" by path from the mountfd.

For example, the path /home/amir/dir/subdir is lookable by path from
unpriv userns of user amir, because /home perms is 755, but the owner of
/home does not have a valid id mapping in unpriv userns of user amir.

The current code did not check that the decoded dentry itself has a
valid id mapping in the userns.  There is no security risk in that,
because that final open still performs the needed permission checks,
but this is inconsistent with the checks performed on the ancestors,
so the behavior can be a bit confusing.

Add the check for the decoded dentry itself, so that the entire path,
including the last component has a valid id mapping in the userns.

Fixes: 620c266f39493 ("fhandle: relax open_by_handle_at() permission checks")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fhandle.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/fhandle.c b/fs/fhandle.c
index 68a7d2861c58f..a907ddfac4d51 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -207,6 +207,14 @@ static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
 	if (!ctx->flags)
 		return 1;

+	/*
+	 * Verify that the decoded dentry itself has a valid id mapping.
+	 * In case the decoded dentry is the mountfd root itself, this
+	 * verifies that the mountfd inode itself has a valid id mapping.
+	 */
+	if (!privileged_wrt_inode_uidgid(user_ns, idmap, d_inode(dentry)))
+		return 0;
+
 	/*
 	 * It's racy as we're not taking rename_lock but we're able to ignore
 	 * permissions and we just need an approximation whether we were able
-- 
2.50.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] fhandle: use more consistent rules for decoding file handle from userns
  2025-08-27 19:43 [PATCH] fhandle: use more consistent rules for decoding file handle from userns Amir Goldstein
@ 2025-08-29  7:48 ` Christian Brauner
  2025-08-29 10:50 ` Jan Kara
  2025-09-01 11:10 ` Jeff Layton
  2 siblings, 0 replies; 8+ messages in thread
From: Christian Brauner @ 2025-08-29  7:48 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, Jan Kara, Chuck Lever, Jeff Layton,
	linux-fsdevel

On Wed, 27 Aug 2025 21:43:09 +0200, Amir Goldstein wrote:
> Commit 620c266f39493 ("fhandle: relax open_by_handle_at() permission
> checks") relaxed the coditions for decoding a file handle from non init
> userns.
> 
> The conditions are that that decoded dentry is accessible from the user
> provided mountfd (or to fs root) and that all the ancestors along the
> path have a valid id mapping in the userns.
> 
> [...]

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/1] fhandle: use more consistent rules for decoding file handle from userns
      https://git.kernel.org/vfs/vfs/c/bb585591ebf0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fhandle: use more consistent rules for decoding file handle from userns
  2025-08-27 19:43 [PATCH] fhandle: use more consistent rules for decoding file handle from userns Amir Goldstein
  2025-08-29  7:48 ` Christian Brauner
@ 2025-08-29 10:50 ` Jan Kara
  2025-08-29 12:55   ` Amir Goldstein
  2025-09-01 11:10 ` Jeff Layton
  2 siblings, 1 reply; 8+ messages in thread
From: Jan Kara @ 2025-08-29 10:50 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, Jan Kara, Chuck Lever, Jeff Layton,
	linux-fsdevel

On Wed 27-08-25 21:43:09, Amir Goldstein wrote:
> Commit 620c266f39493 ("fhandle: relax open_by_handle_at() permission
> checks") relaxed the coditions for decoding a file handle from non init
> userns.
> 
> The conditions are that that decoded dentry is accessible from the user
> provided mountfd (or to fs root) and that all the ancestors along the
> path have a valid id mapping in the userns.
> 
> These conditions are intentionally more strict than the condition that
> the decoded dentry should be "lookable" by path from the mountfd.
> 
> For example, the path /home/amir/dir/subdir is lookable by path from
> unpriv userns of user amir, because /home perms is 755, but the owner of
> /home does not have a valid id mapping in unpriv userns of user amir.
> 
> The current code did not check that the decoded dentry itself has a
> valid id mapping in the userns.  There is no security risk in that,
> because that final open still performs the needed permission checks,
> but this is inconsistent with the checks performed on the ancestors,
> so the behavior can be a bit confusing.
> 
> Add the check for the decoded dentry itself, so that the entire path,
> including the last component has a valid id mapping in the userns.
> 
> Fixes: 620c266f39493 ("fhandle: relax open_by_handle_at() permission checks")
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Yeah, probably it's less surprising this way. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/fhandle.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/fhandle.c b/fs/fhandle.c
> index 68a7d2861c58f..a907ddfac4d51 100644
> --- a/fs/fhandle.c
> +++ b/fs/fhandle.c
> @@ -207,6 +207,14 @@ static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
>  	if (!ctx->flags)
>  		return 1;
>  
> +	/*
> +	 * Verify that the decoded dentry itself has a valid id mapping.
> +	 * In case the decoded dentry is the mountfd root itself, this
> +	 * verifies that the mountfd inode itself has a valid id mapping.
> +	 */
> +	if (!privileged_wrt_inode_uidgid(user_ns, idmap, d_inode(dentry)))
> +		return 0;
> +
>  	/*
>  	 * It's racy as we're not taking rename_lock but we're able to ignore
>  	 * permissions and we just need an approximation whether we were able
> -- 
> 2.50.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fhandle: use more consistent rules for decoding file handle from userns
  2025-08-29 10:50 ` Jan Kara
@ 2025-08-29 12:55   ` Amir Goldstein
  2025-09-01  9:44     ` Jan Kara
  0 siblings, 1 reply; 8+ messages in thread
From: Amir Goldstein @ 2025-08-29 12:55 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christian Brauner, Chuck Lever, Jeff Layton, linux-fsdevel

On Fri, Aug 29, 2025 at 12:50 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 27-08-25 21:43:09, Amir Goldstein wrote:
> > Commit 620c266f39493 ("fhandle: relax open_by_handle_at() permission
> > checks") relaxed the coditions for decoding a file handle from non init
> > userns.
> >
> > The conditions are that that decoded dentry is accessible from the user
> > provided mountfd (or to fs root) and that all the ancestors along the
> > path have a valid id mapping in the userns.
> >
> > These conditions are intentionally more strict than the condition that
> > the decoded dentry should be "lookable" by path from the mountfd.
> >
> > For example, the path /home/amir/dir/subdir is lookable by path from
> > unpriv userns of user amir, because /home perms is 755, but the owner of
> > /home does not have a valid id mapping in unpriv userns of user amir.
> >
> > The current code did not check that the decoded dentry itself has a
> > valid id mapping in the userns.  There is no security risk in that,
> > because that final open still performs the needed permission checks,
> > but this is inconsistent with the checks performed on the ancestors,
> > so the behavior can be a bit confusing.
> >
> > Add the check for the decoded dentry itself, so that the entire path,
> > including the last component has a valid id mapping in the userns.
> >
> > Fixes: 620c266f39493 ("fhandle: relax open_by_handle_at() permission checks")
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> Yeah, probably it's less surprising this way. Feel free to add:
>

BTW, Jan, I was trying to think about whether we could do
something useful with privileged_wrt_inode_uidgid() for filtering
events that we queue by group->user_ns.

Then users could allow something like:
1. Admin sets up privileged fanotify fd and filesystem watch on
    /home filesystem
2. Enters userns of amir and does ioctl to change group->user_ns
    to user ns of amir
3. Hands over fanotify fd to monitor process running in amir's userns
4. amir's monitor process gets all events on filesystem /home
    whose directory and object uid/gid are mappable to amir's userns
5. With properly configured systems, that we be all the files/dirs under
    /home/amir

I have posted several POCs in the past trying different approaches
for filtering by userns, but I have never tried to take this approach.

Compared to subtree filtering, this could be quite pragmatic? Hmm?

The difference from subtree filtering is that it shifts the responsibility
of making sure that /home/amir and /home/jack have files with uid,gid
in different ranges to the OS/runtime, which is a responsibility that
some systems are already taking care of anyway.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fhandle: use more consistent rules for decoding file handle from userns
  2025-08-29 12:55   ` Amir Goldstein
@ 2025-09-01  9:44     ` Jan Kara
  2025-09-01 17:47       ` Amir Goldstein
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kara @ 2025-09-01  9:44 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Christian Brauner, Chuck Lever, Jeff Layton,
	linux-fsdevel

On Fri 29-08-25 14:55:13, Amir Goldstein wrote:
> On Fri, Aug 29, 2025 at 12:50 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Wed 27-08-25 21:43:09, Amir Goldstein wrote:
> > > Commit 620c266f39493 ("fhandle: relax open_by_handle_at() permission
> > > checks") relaxed the coditions for decoding a file handle from non init
> > > userns.
> > >
> > > The conditions are that that decoded dentry is accessible from the user
> > > provided mountfd (or to fs root) and that all the ancestors along the
> > > path have a valid id mapping in the userns.
> > >
> > > These conditions are intentionally more strict than the condition that
> > > the decoded dentry should be "lookable" by path from the mountfd.
> > >
> > > For example, the path /home/amir/dir/subdir is lookable by path from
> > > unpriv userns of user amir, because /home perms is 755, but the owner of
> > > /home does not have a valid id mapping in unpriv userns of user amir.
> > >
> > > The current code did not check that the decoded dentry itself has a
> > > valid id mapping in the userns.  There is no security risk in that,
> > > because that final open still performs the needed permission checks,
> > > but this is inconsistent with the checks performed on the ancestors,
> > > so the behavior can be a bit confusing.
> > >
> > > Add the check for the decoded dentry itself, so that the entire path,
> > > including the last component has a valid id mapping in the userns.
> > >
> > > Fixes: 620c266f39493 ("fhandle: relax open_by_handle_at() permission checks")
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >
> > Yeah, probably it's less surprising this way. Feel free to add:
> >
> 
> BTW, Jan, I was trying to think about whether we could do
> something useful with privileged_wrt_inode_uidgid() for filtering
> events that we queue by group->user_ns.
> 
> Then users could allow something like:
> 1. Admin sets up privileged fanotify fd and filesystem watch on
>     /home filesystem
> 2. Enters userns of amir and does ioctl to change group->user_ns
>     to user ns of amir
> 3. Hands over fanotify fd to monitor process running in amir's userns
> 4. amir's monitor process gets all events on filesystem /home
>     whose directory and object uid/gid are mappable to amir's userns
> 5. With properly configured systems, that we be all the files/dirs under
>     /home/amir
> 
> I have posted several POCs in the past trying different approaches
> for filtering by userns, but I have never tried to take this approach.
> 
> Compared to subtree filtering, this could be quite pragmatic? Hmm?

This is definitely relatively easy to implement in the kernel. I'm just not
sure about two things:

1) Will this be easy enough to use from userspace so that it will get used?
Mount watches have been created as a "partial" solution for subtree watches
as well. But in practice it didn't get very widespread use as subtree watch
replacement because setting up a mountpoint for subtree you want to watch is
not flexible enough. Setting up userns and id mappings and proper inode
ownership seems like a similar hassle for anything else than a full home
dir as well...

2) Filtering all events on the fs only by inode owner being mappable to
user ns looks somewhat dangerous to me. Sure you offload the responsibility
of the safe setup to userspace but the fact that this completely bypasses
any permission checks means that configuring the system so that it does not
leak any unintended information (like filenames or facts that some things
have changed user otherwise wouldn't be able to see) might be difficult.
Consider if e.g. maildir is on your monitored fs and for some reason the
UID of the postfix is mapped to your user ns (e.g. because the user needs
access to some file/dir managed by postfix). Then you could monitor all
fs activity of postfix possibly learning about emails to other persons in
the system.

> The difference from subtree filtering is that it shifts the responsibility
> of making sure that /home/amir and /home/jack have files with uid,gid
> in different ranges to the OS/runtime, which is a responsibility that
> some systems are already taking care of anyway.

At this point I'm not convinced there are that many systems where this way
of filtering would be useful but I could be wrong. The fact that some ID is
mappable in a namespace looks as kind of weak restriction because you may
need to map into the namespace various external "system" ids AFAIU. But I
can see that e.g. for containers the idea of restricting events to inodes
whose owners are in a range of UIDs may be attractive.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fhandle: use more consistent rules for decoding file handle from userns
  2025-08-27 19:43 [PATCH] fhandle: use more consistent rules for decoding file handle from userns Amir Goldstein
  2025-08-29  7:48 ` Christian Brauner
  2025-08-29 10:50 ` Jan Kara
@ 2025-09-01 11:10 ` Jeff Layton
  2 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2025-09-01 11:10 UTC (permalink / raw)
  To: Amir Goldstein, Christian Brauner; +Cc: Jan Kara, Chuck Lever, linux-fsdevel

On Wed, 2025-08-27 at 21:43 +0200, Amir Goldstein wrote:
> Commit 620c266f39493 ("fhandle: relax open_by_handle_at() permission
> checks") relaxed the coditions for decoding a file handle from non init
> userns.
> 
> The conditions are that that decoded dentry is accessible from the user
> provided mountfd (or to fs root) and that all the ancestors along the
> path have a valid id mapping in the userns.
> 
> These conditions are intentionally more strict than the condition that
> the decoded dentry should be "lookable" by path from the mountfd.
> 
> For example, the path /home/amir/dir/subdir is lookable by path from
> unpriv userns of user amir, because /home perms is 755, but the owner of
> /home does not have a valid id mapping in unpriv userns of user amir.
> 
> The current code did not check that the decoded dentry itself has a
> valid id mapping in the userns.  There is no security risk in that,
> because that final open still performs the needed permission checks,
> but this is inconsistent with the checks performed on the ancestors,
> so the behavior can be a bit confusing.
> 
> Add the check for the decoded dentry itself, so that the entire path,
> including the last component has a valid id mapping in the userns.
> 
> Fixes: 620c266f39493 ("fhandle: relax open_by_handle_at() permission checks")
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/fhandle.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/fhandle.c b/fs/fhandle.c
> index 68a7d2861c58f..a907ddfac4d51 100644
> --- a/fs/fhandle.c
> +++ b/fs/fhandle.c
> @@ -207,6 +207,14 @@ static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
>  	if (!ctx->flags)
>  		return 1;
>  
> +	/*
> +	 * Verify that the decoded dentry itself has a valid id mapping.
> +	 * In case the decoded dentry is the mountfd root itself, this
> +	 * verifies that the mountfd inode itself has a valid id mapping.
> +	 */
> +	if (!privileged_wrt_inode_uidgid(user_ns, idmap, d_inode(dentry)))
> +		return 0;
> +
>  	/*
>  	 * It's racy as we're not taking rename_lock but we're able to ignore
>  	 * permissions and we just need an approximation whether we were able

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fhandle: use more consistent rules for decoding file handle from userns
  2025-09-01  9:44     ` Jan Kara
@ 2025-09-01 17:47       ` Amir Goldstein
  2025-09-04 11:23         ` Jan Kara
  0 siblings, 1 reply; 8+ messages in thread
From: Amir Goldstein @ 2025-09-01 17:47 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christian Brauner, Chuck Lever, Jeff Layton, linux-fsdevel

On Mon, Sep 1, 2025 at 11:44 AM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 29-08-25 14:55:13, Amir Goldstein wrote:
> > On Fri, Aug 29, 2025 at 12:50 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Wed 27-08-25 21:43:09, Amir Goldstein wrote:
> > > > Commit 620c266f39493 ("fhandle: relax open_by_handle_at() permission
> > > > checks") relaxed the coditions for decoding a file handle from non init
> > > > userns.
> > > >
> > > > The conditions are that that decoded dentry is accessible from the user
> > > > provided mountfd (or to fs root) and that all the ancestors along the
> > > > path have a valid id mapping in the userns.
> > > >
> > > > These conditions are intentionally more strict than the condition that
> > > > the decoded dentry should be "lookable" by path from the mountfd.
> > > >
> > > > For example, the path /home/amir/dir/subdir is lookable by path from
> > > > unpriv userns of user amir, because /home perms is 755, but the owner of
> > > > /home does not have a valid id mapping in unpriv userns of user amir.
> > > >
> > > > The current code did not check that the decoded dentry itself has a
> > > > valid id mapping in the userns.  There is no security risk in that,
> > > > because that final open still performs the needed permission checks,
> > > > but this is inconsistent with the checks performed on the ancestors,
> > > > so the behavior can be a bit confusing.
> > > >
> > > > Add the check for the decoded dentry itself, so that the entire path,
> > > > including the last component has a valid id mapping in the userns.
> > > >
> > > > Fixes: 620c266f39493 ("fhandle: relax open_by_handle_at() permission checks")
> > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > >
> > > Yeah, probably it's less surprising this way. Feel free to add:
> > >
> >
> > BTW, Jan, I was trying to think about whether we could do
> > something useful with privileged_wrt_inode_uidgid() for filtering
> > events that we queue by group->user_ns.
> >
> > Then users could allow something like:
> > 1. Admin sets up privileged fanotify fd and filesystem watch on
> >     /home filesystem
> > 2. Enters userns of amir and does ioctl to change group->user_ns
> >     to user ns of amir
> > 3. Hands over fanotify fd to monitor process running in amir's userns
> > 4. amir's monitor process gets all events on filesystem /home
> >     whose directory and object uid/gid are mappable to amir's userns
> > 5. With properly configured systems, that we be all the files/dirs under
> >     /home/amir
> >
> > I have posted several POCs in the past trying different approaches
> > for filtering by userns, but I have never tried to take this approach.
> >
> > Compared to subtree filtering, this could be quite pragmatic? Hmm?
>
> This is definitely relatively easy to implement in the kernel. I'm just not
> sure about two things:
>
> 1) Will this be easy enough to use from userspace so that it will get used?
> Mount watches have been created as a "partial" solution for subtree watches
> as well. But in practice it didn't get very widespread use as subtree watch
> replacement because setting up a mountpoint for subtree you want to watch is
> not flexible enough. Setting up userns and id mappings and proper inode
> ownership seems like a similar hassle for anything else than a full home
> dir as well...

I would not suggest this if it were not for systemd-mountfsd which is
designed to allow non-root users to mount "trusted" images (e.g. ext4).

I don't think this feature is already implemented, but an image auto
generated for the user per demand by mkfs, should also be "trusted".

In theory, as user jack, you should be able to spawn an unpriv userns
wherein user jack is uid 0 and get a mount of a freshly formatted ext4 fs
idmapped in a way that only uids from the userns private range could
write to that fs.

*if* this is possible and useful to users, then we will start seeing in the wild
filesystems where all the inodes are owned by a private range of uids,
all mappable to a specific userns.

But TBH, I am not sure if this is already a reality or a likely future or not.
I need to dig some more to understand the future plans for
systemd-mountfsd use cases.

>
> 2) Filtering all events on the fs only by inode owner being mappable to
> user ns looks somewhat dangerous to me. Sure you offload the responsibility
> of the safe setup to userspace but the fact that this completely bypasses
> any permission checks means that configuring the system so that it does not
> leak any unintended information (like filenames or facts that some things
> have changed user otherwise wouldn't be able to see) might be difficult.
> Consider if e.g. maildir is on your monitored fs and for some reason the
> UID of the postfix is mapped to your user ns (e.g. because the user needs
> access to some file/dir managed by postfix). Then you could monitor all
> fs activity of postfix possibly learning about emails to other persons in
> the system.
>

Well, the rule should be that the user setting group->user_ns is ADMIN
in that userns.

If someone has creates a userns where user amir is uid 0 and also
mapped user postfix into the userns of amir, then that gives user amir
full privs to access and modify user postfix owned files, so the privilege
escalation, to the best of my understanding, has already happened way
before user amir started the fanotify monitor.

> > The difference from subtree filtering is that it shifts the responsibility
> > of making sure that /home/amir and /home/jack have files with uid,gid
> > in different ranges to the OS/runtime, which is a responsibility that
> > some systems are already taking care of anyway.
>
> At this point I'm not convinced there are that many systems where this way
> of filtering would be useful but I could be wrong. The fact that some ID is
> mappable in a namespace looks as kind of weak restriction because you may
> need to map into the namespace various external "system" ids AFAIU. But I
> can see that e.g. for containers the idea of restricting events to inodes
> whose owners are in a range of UIDs may be attractive.

I think that for "system containers" (i.e. a nested OS) this could be
attractive, but I don't feel that I know enough to make an authoritative
statement about this.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fhandle: use more consistent rules for decoding file handle from userns
  2025-09-01 17:47       ` Amir Goldstein
@ 2025-09-04 11:23         ` Jan Kara
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Kara @ 2025-09-04 11:23 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Christian Brauner, Chuck Lever, Jeff Layton,
	linux-fsdevel

On Mon 01-09-25 19:47:51, Amir Goldstein wrote:
> On Mon, Sep 1, 2025 at 11:44 AM Jan Kara <jack@suse.cz> wrote:
> >
> > On Fri 29-08-25 14:55:13, Amir Goldstein wrote:
> > > On Fri, Aug 29, 2025 at 12:50 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Wed 27-08-25 21:43:09, Amir Goldstein wrote:
> > > > > Commit 620c266f39493 ("fhandle: relax open_by_handle_at() permission
> > > > > checks") relaxed the coditions for decoding a file handle from non init
> > > > > userns.
> > > > >
> > > > > The conditions are that that decoded dentry is accessible from the user
> > > > > provided mountfd (or to fs root) and that all the ancestors along the
> > > > > path have a valid id mapping in the userns.
> > > > >
> > > > > These conditions are intentionally more strict than the condition that
> > > > > the decoded dentry should be "lookable" by path from the mountfd.
> > > > >
> > > > > For example, the path /home/amir/dir/subdir is lookable by path from
> > > > > unpriv userns of user amir, because /home perms is 755, but the owner of
> > > > > /home does not have a valid id mapping in unpriv userns of user amir.
> > > > >
> > > > > The current code did not check that the decoded dentry itself has a
> > > > > valid id mapping in the userns.  There is no security risk in that,
> > > > > because that final open still performs the needed permission checks,
> > > > > but this is inconsistent with the checks performed on the ancestors,
> > > > > so the behavior can be a bit confusing.
> > > > >
> > > > > Add the check for the decoded dentry itself, so that the entire path,
> > > > > including the last component has a valid id mapping in the userns.
> > > > >
> > > > > Fixes: 620c266f39493 ("fhandle: relax open_by_handle_at() permission checks")
> > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > >
> > > > Yeah, probably it's less surprising this way. Feel free to add:
> > > >
> > >
> > > BTW, Jan, I was trying to think about whether we could do
> > > something useful with privileged_wrt_inode_uidgid() for filtering
> > > events that we queue by group->user_ns.
> > >
> > > Then users could allow something like:
> > > 1. Admin sets up privileged fanotify fd and filesystem watch on
> > >     /home filesystem
> > > 2. Enters userns of amir and does ioctl to change group->user_ns
> > >     to user ns of amir
> > > 3. Hands over fanotify fd to monitor process running in amir's userns
> > > 4. amir's monitor process gets all events on filesystem /home
> > >     whose directory and object uid/gid are mappable to amir's userns
> > > 5. With properly configured systems, that we be all the files/dirs under
> > >     /home/amir
> > >
> > > I have posted several POCs in the past trying different approaches
> > > for filtering by userns, but I have never tried to take this approach.
> > >
> > > Compared to subtree filtering, this could be quite pragmatic? Hmm?
> >
> > This is definitely relatively easy to implement in the kernel. I'm just not
> > sure about two things:
> >
> > 1) Will this be easy enough to use from userspace so that it will get used?
> > Mount watches have been created as a "partial" solution for subtree watches
> > as well. But in practice it didn't get very widespread use as subtree watch
> > replacement because setting up a mountpoint for subtree you want to watch is
> > not flexible enough. Setting up userns and id mappings and proper inode
> > ownership seems like a similar hassle for anything else than a full home
> > dir as well...
> 
> I would not suggest this if it were not for systemd-mountfsd which is
> designed to allow non-root users to mount "trusted" images (e.g. ext4).
> 
> I don't think this feature is already implemented, but an image auto
> generated for the user per demand by mkfs, should also be "trusted".
> 
> In theory, as user jack, you should be able to spawn an unpriv userns
> wherein user jack is uid 0 and get a mount of a freshly formatted ext4 fs
> idmapped in a way that only uids from the userns private range could
> write to that fs.

Ah, I see. Yes, I've heard of similar plans in systemd land.

> *if* this is possible and useful to users, then we will start seeing in
> the wild filesystems where all the inodes are owned by a private range of
> uids, all mappable to a specific userns.

Right. But I expect that the sb->s_user_ns will point to the user's
namespace in that case? So that all the possibly preexisting fs content
gets properly mapped to ids available to the user? If that's the case we'd
already allow placing filesystem mark on such superblocks and there's no
need for filtering?

But I think your original usecase mentioned a different situation with a
filesystem shared by multiple users (/home) but additional idmapping set in
the user namespace where the process is running.

> But TBH, I am not sure if this is already a reality or a likely future or not.
> I need to dig some more to understand the future plans for
> systemd-mountfsd use cases.
> 
> > 2) Filtering all events on the fs only by inode owner being mappable to
> > user ns looks somewhat dangerous to me. Sure you offload the responsibility
> > of the safe setup to userspace but the fact that this completely bypasses
> > any permission checks means that configuring the system so that it does not
> > leak any unintended information (like filenames or facts that some things
> > have changed user otherwise wouldn't be able to see) might be difficult.
> > Consider if e.g. maildir is on your monitored fs and for some reason the
> > UID of the postfix is mapped to your user ns (e.g. because the user needs
> > access to some file/dir managed by postfix). Then you could monitor all
> > fs activity of postfix possibly learning about emails to other persons in
> > the system.
> 
> Well, the rule should be that the user setting group->user_ns is ADMIN
> in that userns.
> 
> If someone has creates a userns where user amir is uid 0 and also
> mapped user postfix into the userns of amir, then that gives user amir
> full privs to access and modify user postfix owned files, so the privilege
> escalation, to the best of my understanding, has already happened way
> before user amir started the fanotify monitor.

Right, sorry, I didn't quite think this through. Indeed as I've checked
e.g. Kubernetes uses disjoint ranges of UIDs to map into user namespaces of
different containers. So filtering filesystem events to inodes whose id is
mappable in such user NS should be OK. But it would be good to verify with
somebody who has more experience with this namespacing stuff than me :)

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-09-04 11:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27 19:43 [PATCH] fhandle: use more consistent rules for decoding file handle from userns Amir Goldstein
2025-08-29  7:48 ` Christian Brauner
2025-08-29 10:50 ` Jan Kara
2025-08-29 12:55   ` Amir Goldstein
2025-09-01  9:44     ` Jan Kara
2025-09-01 17:47       ` Amir Goldstein
2025-09-04 11:23         ` Jan Kara
2025-09-01 11:10 ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).