All of lore.kernel.org
 help / color / mirror / Atom feed
From: "xuyang2018.jy@fujitsu.com" <xuyang2018.jy@fujitsu.com>
To: Christian Brauner <brauner@kernel.org>
Cc: "david@fromorbit.com" <david@fromorbit.com>,
	"djwong@kernel.org" <djwong@kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"jlayton@kernel.org" <jlayton@kernel.org>
Subject: Re: [PATCH v3 6/7] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem
Date: Mon, 18 Apr 2022 03:05:56 +0000	[thread overview]
Message-ID: <625CE3EA.1000301@fujitsu.com> (raw)
In-Reply-To: <20220415142413.j2duwvzsniyqioyy@wittgenstein>

on 2022/4/15 22:24, Christian Brauner wrote:
> On Fri, Apr 15, 2022 at 07:02:22PM +0800, Yang Xu wrote:
>> Currently, vfs only passes mode argument to filesystem, then use inode_init_owner()
>> to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call inode_init_owner
>> firstly, then posxi acl setup, but xfs uses the contrary order. It will affect
>> S_ISGID clear especially we filter S_IXGRP by umask or acl.
>>
>> Regardless of which filesystem is in use, failure to strip the SGID correctly is
>> considered a security failure that needs to be fixed. The current VFS infrastructure
>> requires the filesystem to do everything right and not step on any landmines to
>> strip the SGID bit, when in fact it can easily be done at the VFS and the filesystems
>> then don't even need to be aware that the SGID needs to be (or has been stripped) by
>> the operation the user asked to be done.
>>
>> Vfs has all the info it needs - it doesn't need the filesystems to do everything
>> correctly with the mode and ensuring that they order things like posix acl setup
>> functions correctly with inode_init_owner() to strip the SGID bit.
>>
>> Just strip the SGID bit at the VFS, and then the filesystems can't get it wrong.
>>
>> Also, the inode_sgid_strip() api should be used before IS_POSIXACL() because
>> this api may change mode.
>>
>> Only the following places use inode_init_owner
>> "hugetlbfs/inode.c:846:          inode_init_owner(&init_user_ns, inode, dir, mode);
>>   nilfs2/inode.c:354:     inode_init_owner(&init_user_ns, inode, dir, mode);
>>   zonefs/super.c:1289:    inode_init_owner(&init_user_ns, inode, parent, S_IFDIR | 0555);
>>   reiserfs/namei.c:619:   inode_init_owner(&init_user_ns, inode, dir, mode);
>>   jfs/jfs_inode.c:67:     inode_init_owner(&init_user_ns, inode, parent, mode);
>>   f2fs/namei.c:50:        inode_init_owner(mnt_userns, inode, dir, mode);
>>   ext2/ialloc.c:549:              inode_init_owner(&init_user_ns, inode, dir, mode);
>>   overlayfs/dir.c:643:    inode_init_owner(&init_user_ns, inode, dentry->d_parent->d_inode, mode);
>>   ufs/ialloc.c:292:       inode_init_owner(&init_user_ns, inode, dir, mode);
>>   ntfs3/inode.c:1283:     inode_init_owner(mnt_userns, inode, dir, mode);
>>   ramfs/inode.c:64:               inode_init_owner(&init_user_ns, inode, dir, mode);
>>   9p/vfs_inode.c:263:     inode_init_owner(&init_user_ns, inode, NULL, mode);
>>   btrfs/tests/btrfs-tests.c:65:   inode_init_owner(&init_user_ns, inode, NULL, S_IFREG);
>>   btrfs/inode.c:6215:     inode_init_owner(mnt_userns, inode, dir, mode);
>>   sysv/ialloc.c:166:      inode_init_owner(&init_user_ns, inode, dir, mode);
>>   omfs/inode.c:51:        inode_init_owner(&init_user_ns, inode, NULL, mode);
>>   ubifs/dir.c:97: inode_init_owner(&init_user_ns, inode, dir, mode);
>>   udf/ialloc.c:108:       inode_init_owner(&init_user_ns, inode, dir, mode);
>>   ext4/ialloc.c:979:              inode_init_owner(mnt_userns, inode, dir, mode);
>>   hfsplus/inode.c:393:    inode_init_owner(&init_user_ns, inode, dir, mode);
>>   xfs/xfs_inode.c:840:            inode_init_owner(mnt_userns, inode, dir, mode);
>>   ocfs2/dlmfs/dlmfs.c:331:                inode_init_owner(&init_user_ns, inode, NULL, mode);
>>   ocfs2/dlmfs/dlmfs.c:354:        inode_init_owner(&init_user_ns, inode, parent, mode);
>>   ocfs2/namei.c:200:      inode_init_owner(&init_user_ns, inode, dir, mode);
>>   minix/bitmap.c:255:     inode_init_owner(&init_user_ns, inode, dir, mode);
>>   bfs/dir.c:99:   inode_init_owner(&init_user_ns, inode, dir, mode);
>> "
>>
>> They are used in filesystem init new inode function and these init inode functions are used
>> by following operations:
>> mkdir
>> symlink
>> mknod
>> create
>> tmpfile
>> rename
>>
>> We don't care about mkdir because we don't strip SGID bit for directory except fs.xfs.irix_sgid_inherit.
>> But we even call it in do_mkdirat() since inode_sgid_strip() will skip directories anyway. This will
>> enforce the  same ordering for all relevant operations and it will make the code more uniform and
>> easier to understand by using prepare_mode().
>>
>> symlink and rename only use valid mode that doesn't have SGID bit.
>>
>> We have added inode_sgid_strip api for the remaining operations.
>>
>> In addition to the above six operations, two filesystems has a little difference
>> 1) btrfs has btrfs_create_subvol_root to create new inode but used non SGID bit mode and can ignore
>> 2) ocfs2 reflink function should add inode_sgid_strip api manually because we don't add it in vfs
>>
>> This patch also changed grpid behaviour for ext4/xfs because the mode passed to them may been
>> changed by inode_sgid_strip.
>>
>> Also as Christian Brauner said"
>> The patch itself is useful as it would move a security sensitive operation that is currently burried in
>> individual filesystems into the vfs layer. But it has a decent regression  potential since it might strip
>> filesystems that have so far relied on getting the S_ISGID bit with a mode argument. So this needs a lot
>> of testing and long exposure in -next for at least one full kernel cycle."
>>
>> Suggested-by: Dave Chinner<david@fromorbit.com>
>> Signed-off-by: Yang Xu<xuyang2018.jy@fujitsu.com>
>> ---
>> v2->v3:
>> 1.use new helper prepare_mode to do inode sgid strip and umask strip
>> 2.also use prepare_mode() for mkdirat
>>   fs/inode.c       |  2 --
>>   fs/namei.c       | 14 +++++---------
>>   fs/ocfs2/namei.c |  1 +
>>   3 files changed, 6 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/inode.c b/fs/inode.c
>> index 1b569ad882ce..a250aa01d3c3 100644
>> --- a/fs/inode.c
>> +++ b/fs/inode.c
>> @@ -2246,8 +2246,6 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
>>   		/* Directories are special, and always inherit S_ISGID */
>>   		if (S_ISDIR(mode))
>>   			mode |= S_ISGID;
>> -		else
>> -			inode_sgid_strip(mnt_userns, dir,&mode);
>>   	} else
>>   		inode_fsgid_set(inode, mnt_userns);
>>   	inode->i_mode = mode;
>> diff --git a/fs/namei.c b/fs/namei.c
>> index bbc7c950bbdc..0fadc884af7f 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -3287,8 +3287,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
>>   	if (open_flag&  O_CREAT) {
>>   		if (open_flag&  O_EXCL)
>>   			open_flag&= ~O_TRUNC;
>> -		if (!IS_POSIXACL(dir->d_inode))
>> -			mode&= ~current_umask();
>> +		prepare_mode(mnt_userns, dir->d_inode,&mode);
>>   		if (likely(got_write))
>>   			create_error = may_o_create(mnt_userns,&nd->path,
>>   						    dentry, mode);
>> @@ -3521,8 +3520,7 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns,
>>   	child = d_alloc(dentry,&slash_name);
>>   	if (unlikely(!child))
>>   		goto out_err;
>> -	if (!IS_POSIXACL(dir))
>> -		mode&= ~current_umask();
>> +	prepare_mode(mnt_userns, dir,&mode);
>>   	error = dir->i_op->tmpfile(mnt_userns, dir, child, mode);
>>   	if (error)
>>   		goto out_err;
>> @@ -3852,13 +3850,12 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
>>   	if (IS_ERR(dentry))
>>   		goto out1;
>>
>> -	if (!IS_POSIXACL(path.dentry->d_inode))
>> -		mode&= ~current_umask();
>> +	mnt_userns = mnt_user_ns(path.mnt);
>> +	prepare_mode(mnt_userns, path.dentry->d_inode,&mode);
>>   	error = security_path_mknod(&path, dentry, mode, dev);
>>   	if (error)
>>   		goto out2;
>>
>> -	mnt_userns = mnt_user_ns(path.mnt);
>>   	switch (mode&  S_IFMT) {
>>   		case 0: case S_IFREG:
>>   			error = vfs_create(mnt_userns, path.dentry->d_inode,
>> @@ -3952,12 +3949,11 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
>>   	if (IS_ERR(dentry))
>>   		goto out_putname;
>>
>> -	if (!IS_POSIXACL(path.dentry->d_inode))
>> -		mode&= ~current_umask();
>>   	error = security_path_mkdir(&path, dentry, mode);
>
> Your changes causes the security and the filesystem layer to potentially
> see different values for mode.
>
> You need to change the patch so prepare_mode() is called before the mode
> is passed to the security layer. This will ensure that both the security
> layer and the vfs see the same mode.

Will move it before security_path_mkdir.

Best Regards
Yang Xu
>
>>   	if (!error) {
>>   		struct user_namespace *mnt_userns;
>>   		mnt_userns = mnt_user_ns(path.mnt);
>> +		prepare_mode(mnt_userns, path.dentry->d_inode,&mode);
>>   		error = vfs_mkdir(mnt_userns, path.dentry->d_inode, dentry,
>>   				  mode);
>>   	}
>> diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
>> index c75fd54b9185..c81b8e0847aa 100644
>> --- a/fs/ocfs2/namei.c
>> +++ b/fs/ocfs2/namei.c
>> @@ -198,6 +198,7 @@ static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode)
>>   	if (S_ISDIR(mode))
>>   		set_nlink(inode, 2);
>>   	inode_init_owner(&init_user_ns, inode, dir, mode);
>> +	inode_sgid_strip(&init_user_ns, dir,&mode);
>>   	status = dquot_initialize(inode);
>>   	if (status)
>>   		return ERR_PTR(status);
>> --
>> 2.27.0
>>

  reply	other threads:[~2022-04-18  3:07 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-15 11:02 [PATCH v3 1/7] fs/inode: move sgid strip operation from inode_init_owner into inode_sgid_strip Yang Xu
2022-04-15 11:02 ` [PATCH v3 2/7] fs/namei.c: Add missing umask strip in vfs_tmpfile Yang Xu
2022-04-15 14:17   ` Christian Brauner
2022-04-18  2:55     ` xuyang2018.jy
2022-04-15 11:02 ` [PATCH v3 3/7] xfs: Only do posix acl setup/release operation under CONFIG_XFS_POSIX_ACL Yang Xu
2022-04-15 11:02 ` [PATCH v3 4/7] nfs3: Only do posix acl setup/release operation under CONFIG_NFS_V3_ACL Yang Xu
2022-04-15 11:02 ` [PATCH v3 5/7] fs: Add new helper prepare_mode Yang Xu
2022-04-15 14:19   ` Christian Brauner
2022-04-15 11:02 ` [PATCH v3 6/7] fs: strip file's S_ISGID mode on vfs instead of on underlying filesystem Yang Xu
2022-04-15 14:24   ` Christian Brauner
2022-04-18  3:05     ` xuyang2018.jy [this message]
2022-04-15 11:02 ` [PATCH v3 7/7] ceph: Remove S_ISGID clear code in ceph_finish_async_create Yang Xu
2022-04-18  3:04   ` Xiubo Li
2022-04-18  3:12     ` xuyang2018.jy
2022-04-15 14:09 ` [PATCH v3 1/7] fs/inode: move sgid strip operation from inode_init_owner into inode_sgid_strip Christian Brauner
2022-04-18  2:08   ` xuyang2018.jy
2022-04-18  3:08   ` Matthew Wilcox
2022-04-18  8:39     ` xuyang2018.jy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=625CE3EA.1000301@fujitsu.com \
    --to=xuyang2018.jy@fujitsu.com \
    --cc=brauner@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.