* [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS
@ 2025-05-19 16:11 David Howells
2025-05-19 16:11 ` [PATCH 1/2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir David Howells
2025-05-19 16:11 ` [PATCH 2/2] vfs: Fix inode ownership checks with regard to foreign ownership David Howells
0 siblings, 2 replies; 9+ messages in thread
From: David Howells @ 2025-05-19 16:11 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Marc Dionne, linux-afs, linux-fsdevel,
linux-kernel
Hi Christian,
Here's a pair of fixes that deal with some places the VFS mishandles
foreign user ID checks. By "foreign" I mean that the user IDs from the
filesystem do not belong in the same number space as the system's user IDs.
Network filesystems are prime examples of this, but it may also impact
things like USB drives or cdroms.
Take AFS as example: Whilst each file does have a numeric user ID, the file
may be accessed from a world-accessible public-facing server from some
other organisation with its own idea of what that user ID refers to. IDs
from AFS may also collide with the system's own set of IDs and may also be
unrepresentable as a 32-bit UID (in the case of AuriStor servers).
Further, kAFS uses a key containing an authentication token to specify the
subject doing an RPC operation to the server - and, as such, this needs to
be used instead of current_fsuid() in determining whether the current user
has ownership rights over a file.
Additionally, filesystems (CIFS being a notable example) may also have user
identifiers that aren't simple integers.
Now the problem in the VFS is that there are a number of places where it
assumes it can directly compare i_uid (possibly id-mapped) to either than
on another inode or a UID drawn from elsewhere (e.g. current_uid()) - but
this doesn't work right.
This causes the write-to-sticky check to work incorrectly for AFS (though
this is currently masked by a workaround in bash that is slated to be
removed) whereby open(O_CREAT) of such a file will fail when it shouldn't.
Two patches are provided: The first specifically fixes the bash workaround
issue, delegating the check as to whether two inodes have the same owner
and the check as to whether the current user owns an inode to the
filesystem.
AFS then uses the result of a status-fetch with a suitable key to determine
file ownership and just compares the 64-bit owner IDs to determine if two
inodes have the same ownership.
The second patch expands the use of the VFS helper functions added by the
first to other VFS UID checks.
The patches can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=afs-fixes
Thanks,
David
David Howells (2):
afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir
vfs: Fix inode ownership checks with regard to foreign ownership
fs/afs/dir.c | 2 ++
fs/afs/file.c | 2 ++
fs/afs/internal.h | 3 ++
fs/afs/security.c | 52 ++++++++++++++++++++++++++++++
fs/attr.c | 58 ++++++++++++++++++++-------------
fs/coredump.c | 3 +-
fs/inode.c | 8 +++--
fs/internal.h | 1 +
fs/locks.c | 7 ++--
fs/namei.c | 80 ++++++++++++++++++++++++++++++++--------------
fs/remap_range.c | 20 ++++++------
include/linux/fs.h | 3 ++
12 files changed, 177 insertions(+), 62 deletions(-)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir
2025-05-19 16:11 [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS David Howells
@ 2025-05-19 16:11 ` David Howells
2025-05-30 0:11 ` Jeffrey E Altman
2025-05-19 16:11 ` [PATCH 2/2] vfs: Fix inode ownership checks with regard to foreign ownership David Howells
1 sibling, 1 reply; 9+ messages in thread
From: David Howells @ 2025-05-19 16:11 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Marc Dionne, linux-afs, linux-fsdevel,
linux-kernel, Etienne Champetier, Jeffrey Altman, Chet Ramey,
Cheyenne Wills, Alexander Viro, Christian Brauner, Steve French,
openafs-devel, linux-cifs
Since version 1.11 (January 1992) Bash has a work around in redir_open()
that causes open(O_CREAT) of a file to be retried without O_CREAT if open()
fails with an EACCES error if bash was built with AFS workarounds
configured:
#if defined (AFS)
if ((fd < 0) && (errno == EACCES))
{
fd = open (filename, flags & ~O_CREAT, mode);
errno = EACCES; /* restore errno */
}
#endif /* AFS */
The ~O_CREAT fallback logic was introduced to workaround a bug[1] in the
IBM AFS 3.1 cache manager and server which can return EACCES in preference
to EEXIST if the requested file exists but the caller is neither granted
explicit PRSFS_READ permission nor is the file owner and is granted
PRSFS_INSERT permission on the directory. IBM AFS 3.2 altered the cache
manager permission checks but failed to correct the permission checks in
the AFS server. As of this writing, all IBM AFS derived servers continue
to return EACCES in preference to EEXIST when these conditions are met.
Bug reports have been filed with all implementations.
As an unintended side effect, the Bash fallback logic also undermines the
Linux kernel protections against O_CREAT opening FIFOs and regular files
not owned by the user in world writeable sticky directories - unless the
owner is the same as that of the directory - as was added in commit
30aba6656f61e ("namei: allow restricted O_CREAT of FIFOs and regular
files").
As a result the Bash fallback logic masks an incompatibility between the
ownership checks performed by may_create_in_sticky() and network
filesystems such as AFS where the uid namespace is disjoint from the uid
namespace of the local system.
However, the bash work around is going to be removed[2].
Fix this in the kernel by:
(1) Provide an ->is_owned_by_me() inode op, similar to ->permission(),
and, if provided, call that to determine if the caller owns the file
instead of checking the i_uid to current_fsuid().
(2) Provide a ->have_same_owner() inode op, that, if provided, can be
called to see if an inode has the same owner as the parent on the path
walked.
For kafs, use the first hook to check to see if the server indicated the
ADMINISTER bit in the access rights returned by the FS.FetchStatus and
suchlike and the second hook to compare the AFS user IDs retrieved by
FS.FetchStatus (which may not fit in a kuid if AuriStor's YFS variant).
These hooks should probably be used in other places too, and a follow-up
patch will be submitted for that.
This can be tested by creating a sticky directory (the user must have a
token to do this) and creating a file in it. Then strace bash doing "echo
foo >>file" and look at whether bash does a single, successful O_CREAT open
on the file or whether that one fails and then bash does one without
O_CREAT that succeeds.
Fixes: 30aba6656f61 ("namei: allow restricted O_CREAT of FIFOs and regular files")
Reported-by: Etienne Champetier <champetier.etienne@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Chet Ramey <chet.ramey@case.edu>
cc: Cheyenne Wills <cwills@sinenomine.net>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Christian Brauner <brauner@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: linux-afs@lists.infradead.org
cc: openafs-devel@openafs.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://groups.google.com/g/gnu.bash.bug/c/6PPTfOgFdL4/m/2AQU-S1N76UJ [1]
Link: https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=bash-5.3-rc1#n733 [2]
---
fs/afs/dir.c | 2 ++
fs/afs/file.c | 2 ++
fs/afs/internal.h | 3 +++
fs/afs/security.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++
fs/internal.h | 1 +
fs/namei.c | 50 +++++++++++++++++++++++++++++++++++---------
include/linux/fs.h | 3 +++
7 files changed, 103 insertions(+), 10 deletions(-)
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 9e7b1fe82c27..6360db1673b0 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -65,6 +65,8 @@ const struct inode_operations afs_dir_inode_operations = {
.permission = afs_permission,
.getattr = afs_getattr,
.setattr = afs_setattr,
+ .is_owned_by_me = afs_is_owned_by_me,
+ .have_same_owner = afs_have_same_owner,
};
const struct address_space_operations afs_dir_aops = {
diff --git a/fs/afs/file.c b/fs/afs/file.c
index fc15497608c6..0317f0a36cf2 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -47,6 +47,8 @@ const struct inode_operations afs_file_inode_operations = {
.getattr = afs_getattr,
.setattr = afs_setattr,
.permission = afs_permission,
+ .is_owned_by_me = afs_is_owned_by_me,
+ .have_same_owner = afs_have_same_owner,
};
const struct address_space_operations afs_file_aops = {
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 440b0e731093..fbfbf615abe3 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1495,6 +1495,9 @@ extern struct key *afs_request_key(struct afs_cell *);
extern struct key *afs_request_key_rcu(struct afs_cell *);
extern int afs_check_permit(struct afs_vnode *, struct key *, afs_access_t *);
extern int afs_permission(struct mnt_idmap *, struct inode *, int);
+int afs_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode);
+int afs_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
+ struct dentry *dentry);
extern void __exit afs_clean_up_permit_cache(void);
/*
diff --git a/fs/afs/security.c b/fs/afs/security.c
index 6a7744c9e2a2..a49070c8342d 100644
--- a/fs/afs/security.c
+++ b/fs/afs/security.c
@@ -477,6 +477,58 @@ int afs_permission(struct mnt_idmap *idmap, struct inode *inode,
return ret;
}
+/*
+ * Determine if an inode is owned by 'me' - whatever that means for the
+ * filesystem. In the case of AFS, this means that the file is owned by the
+ * AFS user represented by the token (e.g. from a kerberos server) held in a
+ * key. Returns 0 if owned by me, 1 if not; can also return an error.
+ */
+int afs_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode)
+{
+ struct afs_vnode *vnode = AFS_FS_I(inode);
+ afs_access_t access;
+ struct key *key;
+ int ret;
+
+ key = afs_request_key(vnode->volume->cell);
+ if (IS_ERR(key))
+ return PTR_ERR(key);
+
+ /* Get the access rights for the key on this file. */
+ ret = afs_check_permit(vnode, key, &access);
+ if (ret < 0)
+ goto error;
+
+ /* We get the ADMINISTER bit if we own the file. */
+ ret = (access & AFS_ACE_ADMINISTER) ? 0 : 1;
+error:
+ key_put(key);
+ return ret;
+}
+
+/*
+ * Determine if a file has the same owner as its parent - whatever that means
+ * for the filesystem. In the case of AFS, this means comparing their AFS
+ * UIDs. Returns 0 if same, 1 if not same; can also return an error.
+ */
+int afs_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
+ struct dentry *dentry)
+{
+ struct afs_vnode *vnode = AFS_FS_I(inode), *dvnode;
+ struct dentry *parent;
+ s64 owner;
+
+ /* Get the owner's ID for the directory. Ideally, we'd use RCU to
+ * access the parent rather than getting a ref.
+ */
+ parent = dget_parent(dentry);
+ dvnode = AFS_FS_I(d_backing_inode(parent));
+ owner = dvnode->status.owner;
+ dput(parent);
+
+ return vnode->status.owner != owner;
+}
+
void __exit afs_clean_up_permit_cache(void)
{
int i;
diff --git a/fs/internal.h b/fs/internal.h
index b9b3e29a73fd..9e84bfc5aee6 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -52,6 +52,7 @@ extern int finish_clean_context(struct fs_context *fc);
/*
* namei.c
*/
+int vfs_inode_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode);
extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
struct path *path, struct path *root);
int do_rmdir(int dfd, struct filename *name);
diff --git a/fs/namei.c b/fs/namei.c
index 84a0e0b0111c..9f42dc46322f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -53,8 +53,8 @@
* The new code replaces the old recursive symlink resolution with
* an iterative one (in case of non-nested symlink chains). It does
* this with calls to <fs>_follow_link().
- * As a side effect, dir_namei(), _namei() and follow_link() are now
- * replaced with a single function lookup_dentry() that can handle all
+ * As a side effect, dir_namei(), _namei() and follow_link() are now
+ * replaced with a single function lookup_dentry() that can handle all
* the special cases of the former code.
*
* With the new dcache, the pathname is stored at each inode, at least as
@@ -1149,6 +1149,36 @@ fs_initcall(init_fs_namei_sysctls);
#endif /* CONFIG_SYSCTL */
+/*
+ * Determine if an inode is owned by the process (allowing for fsuid override),
+ * returning 0 if so, 1 if not and a negative error code if there was a problem
+ * making the determination.
+ */
+int vfs_inode_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode)
+{
+ if (unlikely(inode->i_op->is_owned_by_me))
+ return inode->i_op->is_owned_by_me(idmap, inode);
+
+ return vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode),
+ current_fsuid()) ? 1 : 0;
+}
+
+/*
+ * Determine if two inodes have the same owner, returning 0 if so, 1 if not and
+ * a negative error code if there was a problem making the determination.
+ */
+static int vfs_inodes_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
+ const struct nameidata *nd)
+{
+ if (unlikely(inode->i_op->have_same_owner))
+ return inode->i_op->have_same_owner(idmap, inode, nd->path.dentry);
+
+ if (vfsuid_valid(nd->dir_vfsuid) &&
+ vfsuid_eq(i_uid_into_vfsuid(idmap, inode), nd->dir_vfsuid))
+ return 0;
+ return 1; /* Not same. */
+}
+
/**
* may_follow_link - Check symlink following for unsafe situations
* @nd: nameidata pathwalk data
@@ -1302,10 +1332,10 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link)
* Returns 0 if the open is allowed, -ve on error.
*/
static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd,
- struct inode *const inode)
+ struct inode *inode)
{
umode_t dir_mode = nd->dir_mode;
- vfsuid_t dir_vfsuid = nd->dir_vfsuid, i_vfsuid;
+ int ret;
if (likely(!(dir_mode & S_ISVTX)))
return 0;
@@ -1316,13 +1346,13 @@ static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd,
if (S_ISFIFO(inode->i_mode) && !sysctl_protected_fifos)
return 0;
- i_vfsuid = i_uid_into_vfsuid(idmap, inode);
-
- if (vfsuid_eq(i_vfsuid, dir_vfsuid))
- return 0;
+ ret = vfs_inodes_have_same_owner(idmap, inode, nd);
+ if (ret <= 0)
+ return ret;
- if (vfsuid_eq_kuid(i_vfsuid, current_fsuid()))
- return 0;
+ ret = vfs_inode_is_owned_by_me(idmap, inode);
+ if (ret <= 0)
+ return ret;
if (likely(dir_mode & 0002)) {
audit_log_path_denied(AUDIT_ANOM_CREAT, "sticky_create");
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 016b0fe1536e..ec278d2d362a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2236,6 +2236,9 @@ struct inode_operations {
struct dentry *dentry, struct fileattr *fa);
int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
+ int (*is_owned_by_me)(struct mnt_idmap *idmap, struct inode *inode);
+ int (*have_same_owner)(struct mnt_idmap *idmap, struct inode *inode,
+ struct dentry *dentry);
} ____cacheline_aligned;
static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] vfs: Fix inode ownership checks with regard to foreign ownership
2025-05-19 16:11 [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS David Howells
2025-05-19 16:11 ` [PATCH 1/2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir David Howells
@ 2025-05-19 16:11 ` David Howells
2025-05-30 1:03 ` Jeffrey E Altman
1 sibling, 1 reply; 9+ messages in thread
From: David Howells @ 2025-05-19 16:11 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Marc Dionne, linux-afs, linux-fsdevel,
linux-kernel, Etienne Champetier, Jeffrey Altman, Chet Ramey,
Cheyenne Wills, Alexander Viro, Christian Brauner, Steve French,
Mimi Zohar, openafs-devel, linux-cifs, linux-integrity
Fix a number of ownership checks made by the VFS that assume that
inode->i_uid is meaningful with respect to the UID space of the system
performing the check. Network filesystems, however, may violate this
assumption - and, indeed, a network filesystem may not even have an actual
concept of a UNIX integer UID (cifs, for example).
There are a number of places within the VFS where UID checks are made and
some of these should be deferring the interpretation to the filesystem by
way of the previously added vfs_inode_is_owned_by_me() and
vfs_inodes_have_same_owner():
(*) chown_ok()
(*) chgrp_ok()
These should call vfs_inode_is_owned_by_me(). Possibly these need to
defer all their checks to the network filesystem as the interpretation
of the new UID/GID depends on the netfs too, but the ->setattr()
method gets a chance to deal with that.
(*) do_coredump()
Should probably call vfs_is_owned_by_me() to check that the file
created is owned by the caller - but the check that's there might be
sufficient.
(*) inode_owner_or_capable()
Should call vfs_is_owned_by_me(). I'm not sure whether the namespace
mapping makes sense in such a case, but it probably could be used.
(*) vfs_setlease()
Should call vfs_is_owned_by_me(). Actually, it should query if
leasing is permitted.
Also, setting locks could perhaps do with a permission call to the
filesystem driver as AFS, for example, has a lock permission bit in
the ACL, but since the AFS server checks that when the RPC call is
made, it's probably unnecessary.
(*) acl_permission_check()
(*) posix_acl_permission()
These functions are only used by generic_permission() which is
overridden if ->permission() is supplied, and when evaluating a POSIX
ACL, it should arguably be checking the UID anyway.
AFS, for example, implements its own ACLs and evaluates them in
->permission() and on the server.
(*) may_follow_link()
Should call vfs_is_owned_by_me() and also vfs_have_same_owner() on the
the link and its parent dir. The latter only applies on
world-writable sticky dirs.
(*) may_create_in_sticky()
The initial subject of this patch. Should call vfs_is_owned_by_me()
and also vfs_have_same_owner() both.
(*) __check_sticky()
Should call vfs_is_owned_by_me() on both the dir and the inode.
(*) may_dedupe_file()
Should call vfs_is_owned_by_me().
(*) IMA policy ops.
I'm not sure what the best way to deal with this is - if, indeed, it
needs any changes.
Note that wrapping stuff up into vfs_inode_is_owned_by_me() isn't
necessarily the most efficient as it means we may end up doing the uid
idmapping an extra time - though mostly this is in places where I'm not
sure it matters so much.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Etienne Champetier <champetier.etienne@gmail.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Chet Ramey <chet.ramey@case.edu>
cc: Cheyenne Wills <cwills@sinenomine.net>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Christian Brauner <brauner@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Mimi Zohar <zohar@linux.ibm.com>
cc: linux-afs@lists.infradead.org
cc: openafs-devel@openafs.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-integrity@vger.kernel.org
Link: https://groups.google.com/g/gnu.bash.bug/c/6PPTfOgFdL4/m/2AQU-S1N76UJ
Link: https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=bash-5.3-rc1#n733
---
fs/attr.c | 58 +++++++++++++++++++++++++++++-------------------
fs/coredump.c | 3 +--
fs/inode.c | 8 +++++--
fs/locks.c | 7 ++++--
fs/namei.c | 30 +++++++++++++------------
fs/remap_range.c | 20 +++++++++--------
6 files changed, 74 insertions(+), 52 deletions(-)
diff --git a/fs/attr.c b/fs/attr.c
index 9caf63d20d03..fefd92af56a2 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -16,6 +16,7 @@
#include <linux/fcntl.h>
#include <linux/filelock.h>
#include <linux/security.h>
+#include "internal.h"
/**
* setattr_should_drop_sgid - determine whether the setgid bit needs to be
@@ -91,19 +92,21 @@ EXPORT_SYMBOL(setattr_should_drop_suidgid);
* permissions. On non-idmapped mounts or if permission checking is to be
* performed on the raw inode simply pass @nop_mnt_idmap.
*/
-static bool chown_ok(struct mnt_idmap *idmap,
- const struct inode *inode, vfsuid_t ia_vfsuid)
+static int chown_ok(struct mnt_idmap *idmap,
+ const struct inode *inode, vfsuid_t ia_vfsuid)
{
vfsuid_t vfsuid = i_uid_into_vfsuid(idmap, inode);
- if (vfsuid_eq_kuid(vfsuid, current_fsuid()) &&
- vfsuid_eq(ia_vfsuid, vfsuid))
- return true;
+ int ret;
+
+ ret = vfs_inode_is_owned_by_me(idmap, inode);
+ if (ret <= 0)
+ return ret;
if (capable_wrt_inode_uidgid(idmap, inode, CAP_CHOWN))
- return true;
+ return 0;
if (!vfsuid_valid(vfsuid) &&
ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
- return true;
- return false;
+ return 0;
+ return -EPERM;
}
/**
@@ -118,23 +121,27 @@ static bool chown_ok(struct mnt_idmap *idmap,
* permissions. On non-idmapped mounts or if permission checking is to be
* performed on the raw inode simply pass @nop_mnt_idmap.
*/
-static bool chgrp_ok(struct mnt_idmap *idmap,
- const struct inode *inode, vfsgid_t ia_vfsgid)
+static int chgrp_ok(struct mnt_idmap *idmap,
+ const struct inode *inode, vfsgid_t ia_vfsgid)
{
vfsgid_t vfsgid = i_gid_into_vfsgid(idmap, inode);
- vfsuid_t vfsuid = i_uid_into_vfsuid(idmap, inode);
- if (vfsuid_eq_kuid(vfsuid, current_fsuid())) {
+ int ret;
+
+ ret = vfs_inode_is_owned_by_me(idmap, inode);
+ if (ret < 0)
+ return ret;
+ if (ret == 0) {
if (vfsgid_eq(ia_vfsgid, vfsgid))
- return true;
+ return 0;
if (vfsgid_in_group_p(ia_vfsgid))
- return true;
+ return 0;
}
if (capable_wrt_inode_uidgid(idmap, inode, CAP_CHOWN))
- return true;
+ return 0;
if (!vfsgid_valid(vfsgid) &&
ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
- return true;
- return false;
+ return 0;
+ return -EPERM;
}
/**
@@ -163,6 +170,7 @@ int setattr_prepare(struct mnt_idmap *idmap, struct dentry *dentry,
{
struct inode *inode = d_inode(dentry);
unsigned int ia_valid = attr->ia_valid;
+ int ret;
/*
* First check size constraints. These can't be overriden using
@@ -179,14 +187,18 @@ int setattr_prepare(struct mnt_idmap *idmap, struct dentry *dentry,
goto kill_priv;
/* Make sure a caller can chown. */
- if ((ia_valid & ATTR_UID) &&
- !chown_ok(idmap, inode, attr->ia_vfsuid))
- return -EPERM;
+ if (ia_valid & ATTR_UID) {
+ ret = chown_ok(idmap, inode, attr->ia_vfsuid);
+ if (ret < 0)
+ return ret;
+ }
/* Make sure caller can chgrp. */
- if ((ia_valid & ATTR_GID) &&
- !chgrp_ok(idmap, inode, attr->ia_vfsgid))
- return -EPERM;
+ if (ia_valid & ATTR_GID) {
+ ret = chgrp_ok(idmap, inode, attr->ia_vfsgid);
+ if (ret < 0)
+ return ret;
+ }
/* Make sure a caller can chmod. */
if (ia_valid & ATTR_MODE) {
diff --git a/fs/coredump.c b/fs/coredump.c
index c33c177a701b..ded3936b2067 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -720,8 +720,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
* filesystem.
*/
idmap = file_mnt_idmap(cprm.file);
- if (!vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode),
- current_fsuid())) {
+ if (vfs_inode_is_owned_by_me(idmap, inode) != 0) {
coredump_report_failure("Core dump to %s aborted: "
"cannot preserve file owner", cn.corename);
goto close_fail;
diff --git a/fs/inode.c b/fs/inode.c
index 99318b157a9a..7e91b6f87757 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2586,11 +2586,15 @@ bool inode_owner_or_capable(struct mnt_idmap *idmap,
{
vfsuid_t vfsuid;
struct user_namespace *ns;
+ int ret;
- vfsuid = i_uid_into_vfsuid(idmap, inode);
- if (vfsuid_eq_kuid(vfsuid, current_fsuid()))
+ ret = vfs_inode_is_owned_by_me(idmap, inode);
+ if (ret == 0)
return true;
+ if (ret < 0)
+ return false;
+ vfsuid = i_uid_into_vfsuid(idmap, inode);
ns = current_user_ns();
if (vfsuid_has_mapping(ns, vfsuid) && ns_capable(ns, CAP_FOWNER))
return true;
diff --git a/fs/locks.c b/fs/locks.c
index 1619cddfa7a4..b7a2a3ab7529 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -68,6 +68,7 @@
#include <trace/events/filelock.h>
#include <linux/uaccess.h>
+#include "internal.h"
static struct file_lock *file_lock(struct file_lock_core *flc)
{
@@ -2013,10 +2014,12 @@ int
vfs_setlease(struct file *filp, int arg, struct file_lease **lease, void **priv)
{
struct inode *inode = file_inode(filp);
- vfsuid_t vfsuid = i_uid_into_vfsuid(file_mnt_idmap(filp), inode);
int error;
- if ((!vfsuid_eq_kuid(vfsuid, current_fsuid())) && !capable(CAP_LEASE))
+ error = vfs_inode_is_owned_by_me(file_mnt_idmap(filp), inode);
+ if (error < 0)
+ return error;
+ if (error != 0 && !capable(CAP_LEASE))
return -EACCES;
if (!S_ISREG(inode->i_mode))
return -EINVAL;
diff --git a/fs/namei.c b/fs/namei.c
index 9f42dc46322f..6ede952d424a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1195,26 +1195,26 @@ static int vfs_inodes_have_same_owner(struct mnt_idmap *idmap, struct inode *ino
*
* Returns 0 if following the symlink is allowed, -ve on error.
*/
-static inline int may_follow_link(struct nameidata *nd, const struct inode *inode)
+static inline int may_follow_link(struct nameidata *nd, struct inode *inode)
{
struct mnt_idmap *idmap;
- vfsuid_t vfsuid;
+ int ret;
if (!sysctl_protected_symlinks)
return 0;
- idmap = mnt_idmap(nd->path.mnt);
- vfsuid = i_uid_into_vfsuid(idmap, inode);
- /* Allowed if owner and follower match. */
- if (vfsuid_eq_kuid(vfsuid, current_fsuid()))
- return 0;
-
/* Allowed if parent directory not sticky and world-writable. */
if ((nd->dir_mode & (S_ISVTX|S_IWOTH)) != (S_ISVTX|S_IWOTH))
return 0;
+ idmap = mnt_idmap(nd->path.mnt);
+ /* Allowed if owner and follower match. */
+ ret = vfs_inode_is_owned_by_me(idmap, inode);
+ if (ret <= 0)
+ return ret;
+
/* Allowed if parent directory and link owner match. */
- if (vfsuid_valid(nd->dir_vfsuid) && vfsuid_eq(nd->dir_vfsuid, vfsuid))
+ if (vfs_inodes_have_same_owner(idmap, inode, nd))
return 0;
if (nd->flags & LOOKUP_RCU)
@@ -3157,12 +3157,14 @@ EXPORT_SYMBOL(user_path_at);
int __check_sticky(struct mnt_idmap *idmap, struct inode *dir,
struct inode *inode)
{
- kuid_t fsuid = current_fsuid();
+ int ret;
- if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), fsuid))
- return 0;
- if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, dir), fsuid))
- return 0;
+ ret = vfs_inode_is_owned_by_me(idmap, inode);
+ if (ret <= 0)
+ return ret;
+ ret = vfs_inode_is_owned_by_me(idmap, dir);
+ if (ret <= 0)
+ return ret;
return !capable_wrt_inode_uidgid(idmap, inode, CAP_FOWNER);
}
EXPORT_SYMBOL(__check_sticky);
diff --git a/fs/remap_range.c b/fs/remap_range.c
index 26afbbbfb10c..9eee93c27001 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -413,20 +413,22 @@ loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
EXPORT_SYMBOL(vfs_clone_file_range);
/* Check whether we are allowed to dedupe the destination file */
-static bool may_dedupe_file(struct file *file)
+static int may_dedupe_file(struct file *file)
{
struct mnt_idmap *idmap = file_mnt_idmap(file);
struct inode *inode = file_inode(file);
+ int ret;
if (capable(CAP_SYS_ADMIN))
- return true;
+ return 0;
if (file->f_mode & FMODE_WRITE)
- return true;
- if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), current_fsuid()))
- return true;
+ return 0;
+ ret = vfs_inode_is_owned_by_me(idmap, inode);
+ if (ret <= 0)
+ return ret;
if (!inode_permission(idmap, inode, MAY_WRITE))
- return true;
- return false;
+ return 0;
+ return -EPERM;
}
loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
@@ -459,8 +461,8 @@ loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
if (ret)
return ret;
- ret = -EPERM;
- if (!may_dedupe_file(dst_file))
+ ret = may_dedupe_file(dst_file);
+ if (ret < 0)
goto out_drop_write;
ret = -EXDEV;
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir
2025-05-19 16:11 ` [PATCH 1/2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir David Howells
@ 2025-05-30 0:11 ` Jeffrey E Altman
2025-05-30 2:42 ` Jeffrey E Altman
0 siblings, 1 reply; 9+ messages in thread
From: Jeffrey E Altman @ 2025-05-30 0:11 UTC (permalink / raw)
To: David Howells, Christian Brauner
Cc: Marc Dionne, linux-afs, linux-fsdevel, linux-kernel,
Etienne Champetier, Chet Ramey, Cheyenne Wills, Alexander Viro,
Christian Brauner, Steve French, openafs-devel, linux-cifs
[-- Attachment #1: Type: text/plain, Size: 13047 bytes --]
On 5/19/2025 12:11 PM, David Howells wrote:
> Since version 1.11 (January 1992) Bash has a work around in redir_open()
> that causes open(O_CREAT) of a file to be retried without O_CREAT if open()
> fails with an EACCES error if bash was built with AFS workarounds
> configured:
>
> #if defined (AFS)
> if ((fd < 0) && (errno == EACCES))
> {
> fd = open (filename, flags & ~O_CREAT, mode);
> errno = EACCES; /* restore errno */
> }
>
> #endif /* AFS */
>
> The ~O_CREAT fallback logic was introduced to workaround a bug[1] in the
> IBM AFS 3.1 cache manager and server which can return EACCES in preference
> to EEXIST if the requested file exists but the caller is neither granted
> explicit PRSFS_READ permission nor is the file owner and is granted
> PRSFS_INSERT permission on the directory. IBM AFS 3.2 altered the cache
> manager permission checks but failed to correct the permission checks in
> the AFS server. As of this writing, all IBM AFS derived servers continue
> to return EACCES in preference to EEXIST when these conditions are met.
> Bug reports have been filed with all implementations.
>
> As an unintended side effect, the Bash fallback logic also undermines the
> Linux kernel protections against O_CREAT opening FIFOs and regular files
> not owned by the user in world writeable sticky directories - unless the
> owner is the same as that of the directory - as was added in commit
> 30aba6656f61e ("namei: allow restricted O_CREAT of FIFOs and regular
> files").
>
> As a result the Bash fallback logic masks an incompatibility between the
> ownership checks performed by may_create_in_sticky() and network
> filesystems such as AFS where the uid namespace is disjoint from the uid
> namespace of the local system.
>
> However, the bash work around is going to be removed[2].
>
> Fix this in the kernel by:
>
> (1) Provide an ->is_owned_by_me() inode op, similar to ->permission(),
> and, if provided, call that to determine if the caller owns the file
> instead of checking the i_uid to current_fsuid().
>
> (2) Provide a ->have_same_owner() inode op, that, if provided, can be
> called to see if an inode has the same owner as the parent on the path
> walked.
>
> For kafs, use the first hook to check to see if the server indicated the
> ADMINISTER bit in the access rights returned by the FS.FetchStatus and
> suchlike and the second hook to compare the AFS user IDs retrieved by
> FS.FetchStatus (which may not fit in a kuid if AuriStor's YFS variant).
>
> These hooks should probably be used in other places too, and a follow-up
> patch will be submitted for that.
>
> This can be tested by creating a sticky directory (the user must have a
> token to do this) and creating a file in it. Then strace bash doing "echo
> foo >>file" and look at whether bash does a single, successful O_CREAT open
> on the file or whether that one fails and then bash does one without
> O_CREAT that succeeds.
>
> Fixes: 30aba6656f61 ("namei: allow restricted O_CREAT of FIFOs and regular files")
> Reported-by: Etienne Champetier <champetier.etienne@gmail.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: Jeffrey Altman <jaltman@auristor.com>
> cc: Chet Ramey <chet.ramey@case.edu>
> cc: Cheyenne Wills <cwills@sinenomine.net>
> cc: Alexander Viro <viro@zeniv.linux.org.uk>
> cc: Christian Brauner <brauner@kernel.org>
> cc: Steve French <sfrench@samba.org>
> cc: linux-afs@lists.infradead.org
> cc: openafs-devel@openafs.org
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> Link: https://groups.google.com/g/gnu.bash.bug/c/6PPTfOgFdL4/m/2AQU-S1N76UJ [1]
> Link: https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=bash-5.3-rc1#n733 [2]
> ---
> fs/afs/dir.c | 2 ++
> fs/afs/file.c | 2 ++
> fs/afs/internal.h | 3 +++
> fs/afs/security.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++
> fs/internal.h | 1 +
> fs/namei.c | 50 +++++++++++++++++++++++++++++++++++---------
> include/linux/fs.h | 3 +++
> 7 files changed, 103 insertions(+), 10 deletions(-)
>
> diff --git a/fs/afs/dir.c b/fs/afs/dir.c
> index 9e7b1fe82c27..6360db1673b0 100644
> --- a/fs/afs/dir.c
> +++ b/fs/afs/dir.c
> @@ -65,6 +65,8 @@ const struct inode_operations afs_dir_inode_operations = {
> .permission = afs_permission,
> .getattr = afs_getattr,
> .setattr = afs_setattr,
> + .is_owned_by_me = afs_is_owned_by_me,
> + .have_same_owner = afs_have_same_owner,
> };
>
> const struct address_space_operations afs_dir_aops = {
> diff --git a/fs/afs/file.c b/fs/afs/file.c
> index fc15497608c6..0317f0a36cf2 100644
> --- a/fs/afs/file.c
> +++ b/fs/afs/file.c
> @@ -47,6 +47,8 @@ const struct inode_operations afs_file_inode_operations = {
> .getattr = afs_getattr,
> .setattr = afs_setattr,
> .permission = afs_permission,
> + .is_owned_by_me = afs_is_owned_by_me,
> + .have_same_owner = afs_have_same_owner,
> };
>
> const struct address_space_operations afs_file_aops = {
> diff --git a/fs/afs/internal.h b/fs/afs/internal.h
> index 440b0e731093..fbfbf615abe3 100644
> --- a/fs/afs/internal.h
> +++ b/fs/afs/internal.h
> @@ -1495,6 +1495,9 @@ extern struct key *afs_request_key(struct afs_cell *);
> extern struct key *afs_request_key_rcu(struct afs_cell *);
> extern int afs_check_permit(struct afs_vnode *, struct key *, afs_access_t *);
> extern int afs_permission(struct mnt_idmap *, struct inode *, int);
> +int afs_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode);
> +int afs_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
> + struct dentry *dentry);
> extern void __exit afs_clean_up_permit_cache(void);
>
> /*
> diff --git a/fs/afs/security.c b/fs/afs/security.c
> index 6a7744c9e2a2..a49070c8342d 100644
> --- a/fs/afs/security.c
> +++ b/fs/afs/security.c
> @@ -477,6 +477,58 @@ int afs_permission(struct mnt_idmap *idmap, struct inode *inode,
> return ret;
> }
>
> +/*
> + * Determine if an inode is owned by 'me' - whatever that means for the
> + * filesystem. In the case of AFS, this means that the file is owned by the
> + * AFS user represented by the token (e.g. from a kerberos server) held in a
> + * key. Returns 0 if owned by me, 1 if not; can also return an error.
> + */
Technically AFS tokens are not issued by Kerberos KDCs. Could we say
"... the file is owned
by the AFS user represented by the Rx Security Class token held in a key."?
> +int afs_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode)
> +{
> + struct afs_vnode *vnode = AFS_FS_I(inode);
> + afs_access_t access;
> + struct key *key;
> + int ret;
> +
> + key = afs_request_key(vnode->volume->cell);
> + if (IS_ERR(key))
> + return PTR_ERR(key);
> +
> + /* Get the access rights for the key on this file. */
> + ret = afs_check_permit(vnode, key, &access);
> + if (ret < 0)
> + goto error;
> +
> + /* We get the ADMINISTER bit if we own the file. */
> + ret = (access & AFS_ACE_ADMINISTER) ? 0 : 1;
AFS_ACE_ADMINISTER only means ownership if the inode is a
non-directory. Should
we add an explicit check for inode type?
> +error:
> + key_put(key);
> + return ret;
> +}
> +
> +/*
> + * Determine if a file has the same owner as its parent - whatever that means
> + * for the filesystem. In the case of AFS, this means comparing their AFS
> + * UIDs. Returns 0 if same, 1 if not same; can also return an error.
> + */
> +int afs_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
> + struct dentry *dentry)
> +{
> + struct afs_vnode *vnode = AFS_FS_I(inode), *dvnode;
> + struct dentry *parent;
> + s64 owner;
> +
> + /* Get the owner's ID for the directory. Ideally, we'd use RCU to
> + * access the parent rather than getting a ref.
> + */
> + parent = dget_parent(dentry);
> + dvnode = AFS_FS_I(d_backing_inode(parent));
> + owner = dvnode->status.owner;
> + dput(parent);
> +
> + return vnode->status.owner != owner;
> +}
> +
> void __exit afs_clean_up_permit_cache(void)
> {
> int i;
> diff --git a/fs/internal.h b/fs/internal.h
> index b9b3e29a73fd..9e84bfc5aee6 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -52,6 +52,7 @@ extern int finish_clean_context(struct fs_context *fc);
> /*
> * namei.c
> */
> +int vfs_inode_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode);
> extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> struct path *path, struct path *root);
> int do_rmdir(int dfd, struct filename *name);
> diff --git a/fs/namei.c b/fs/namei.c
> index 84a0e0b0111c..9f42dc46322f 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -53,8 +53,8 @@
> * The new code replaces the old recursive symlink resolution with
> * an iterative one (in case of non-nested symlink chains). It does
> * this with calls to <fs>_follow_link().
> - * As a side effect, dir_namei(), _namei() and follow_link() are now
> - * replaced with a single function lookup_dentry() that can handle all
> + * As a side effect, dir_namei(), _namei() and follow_link() are now
> + * replaced with a single function lookup_dentry() that can handle all
> * the special cases of the former code.
> *
> * With the new dcache, the pathname is stored at each inode, at least as
> @@ -1149,6 +1149,36 @@ fs_initcall(init_fs_namei_sysctls);
>
> #endif /* CONFIG_SYSCTL */
>
> +/*
> + * Determine if an inode is owned by the process (allowing for fsuid override),
> + * returning 0 if so, 1 if not and a negative error code if there was a problem
> + * making the determination.
> + */
> +int vfs_inode_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode)
> +{
> + if (unlikely(inode->i_op->is_owned_by_me))
> + return inode->i_op->is_owned_by_me(idmap, inode);
> +
> + return vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode),
> + current_fsuid()) ? 1 : 0;
> +}
> +
> +/*
> + * Determine if two inodes have the same owner, returning 0 if so, 1 if not and
> + * a negative error code if there was a problem making the determination.
> + */
> +static int vfs_inodes_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
> + const struct nameidata *nd)
> +{
> + if (unlikely(inode->i_op->have_same_owner))
> + return inode->i_op->have_same_owner(idmap, inode, nd->path.dentry);
> +
> + if (vfsuid_valid(nd->dir_vfsuid) &&
> + vfsuid_eq(i_uid_into_vfsuid(idmap, inode), nd->dir_vfsuid))
> + return 0;
> + return 1; /* Not same. */
> +}
> +
> /**
> * may_follow_link - Check symlink following for unsafe situations
> * @nd: nameidata pathwalk data
> @@ -1302,10 +1332,10 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link)
> * Returns 0 if the open is allowed, -ve on error.
> */
> static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd,
> - struct inode *const inode)
> + struct inode *inode)
> {
> umode_t dir_mode = nd->dir_mode;
> - vfsuid_t dir_vfsuid = nd->dir_vfsuid, i_vfsuid;
> + int ret;
>
> if (likely(!(dir_mode & S_ISVTX)))
> return 0;
> @@ -1316,13 +1346,13 @@ static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd,
> if (S_ISFIFO(inode->i_mode) && !sysctl_protected_fifos)
> return 0;
>
> - i_vfsuid = i_uid_into_vfsuid(idmap, inode);
> -
> - if (vfsuid_eq(i_vfsuid, dir_vfsuid))
> - return 0;
> + ret = vfs_inodes_have_same_owner(idmap, inode, nd);
> + if (ret <= 0)
> + return ret;
>
> - if (vfsuid_eq_kuid(i_vfsuid, current_fsuid()))
> - return 0;
> + ret = vfs_inode_is_owned_by_me(idmap, inode);
> + if (ret <= 0)
> + return ret;
>
> if (likely(dir_mode & 0002)) {
> audit_log_path_denied(AUDIT_ANOM_CREAT, "sticky_create");
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 016b0fe1536e..ec278d2d362a 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2236,6 +2236,9 @@ struct inode_operations {
> struct dentry *dentry, struct fileattr *fa);
> int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
> struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
> + int (*is_owned_by_me)(struct mnt_idmap *idmap, struct inode *inode);
> + int (*have_same_owner)(struct mnt_idmap *idmap, struct inode *inode,
> + struct dentry *dentry);
> } ____cacheline_aligned;
>
> static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
It would be nice if this patch added documentation for the new
inode_operations to
Documentation/filesystems/vfs.rst.
The may_create_in_sticky() logic looks good to me.
Thanks for the patch.
Jeffrey Altman
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4276 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] vfs: Fix inode ownership checks with regard to foreign ownership
2025-05-19 16:11 ` [PATCH 2/2] vfs: Fix inode ownership checks with regard to foreign ownership David Howells
@ 2025-05-30 1:03 ` Jeffrey E Altman
2025-07-21 16:15 ` Jeffrey Altman
0 siblings, 1 reply; 9+ messages in thread
From: Jeffrey E Altman @ 2025-05-30 1:03 UTC (permalink / raw)
To: David Howells, Christian Brauner
Cc: Marc Dionne, linux-afs, linux-fsdevel, linux-kernel,
Etienne Champetier, Chet Ramey, Cheyenne Wills, Alexander Viro,
Christian Brauner, Steve French, Mimi Zohar, openafs-devel,
linux-cifs, linux-integrity
[-- Attachment #1: Type: text/plain, Size: 13825 bytes --]
On 5/19/2025 12:11 PM, David Howells wrote:
> Fix a number of ownership checks made by the VFS that assume that
> inode->i_uid is meaningful with respect to the UID space of the system
> performing the check. Network filesystems, however, may violate this
> assumption - and, indeed, a network filesystem may not even have an actual
> concept of a UNIX integer UID (cifs, for example).
>
> There are a number of places within the VFS where UID checks are made and
> some of these should be deferring the interpretation to the filesystem by
> way of the previously added vfs_inode_is_owned_by_me() and
> vfs_inodes_have_same_owner():
>
> (*) chown_ok()
> (*) chgrp_ok()
>
> These should call vfs_inode_is_owned_by_me(). Possibly these need to
> defer all their checks to the network filesystem as the interpretation
> of the new UID/GID depends on the netfs too, but the ->setattr()
> method gets a chance to deal with that.
>
> (*) do_coredump()
>
> Should probably call vfs_is_owned_by_me() to check that the file
> created is owned by the caller - but the check that's there might be
> sufficient.
>
> (*) inode_owner_or_capable()
>
> Should call vfs_is_owned_by_me(). I'm not sure whether the namespace
> mapping makes sense in such a case, but it probably could be used.
>
> (*) vfs_setlease()
>
> Should call vfs_is_owned_by_me(). Actually, it should query if
> leasing is permitted.
>
> Also, setting locks could perhaps do with a permission call to the
> filesystem driver as AFS, for example, has a lock permission bit in
> the ACL, but since the AFS server checks that when the RPC call is
> made, it's probably unnecessary.
>
> (*) acl_permission_check()
> (*) posix_acl_permission()
>
> These functions are only used by generic_permission() which is
> overridden if ->permission() is supplied, and when evaluating a POSIX
> ACL, it should arguably be checking the UID anyway.
>
> AFS, for example, implements its own ACLs and evaluates them in
> ->permission() and on the server.
>
> (*) may_follow_link()
>
> Should call vfs_is_owned_by_me() and also vfs_have_same_owner() on the
> the link and its parent dir. The latter only applies on
> world-writable sticky dirs.
>
> (*) may_create_in_sticky()
>
> The initial subject of this patch. Should call vfs_is_owned_by_me()
> and also vfs_have_same_owner() both.
>
> (*) __check_sticky()
>
> Should call vfs_is_owned_by_me() on both the dir and the inode.
>
> (*) may_dedupe_file()
>
> Should call vfs_is_owned_by_me().
>
> (*) IMA policy ops.
>
> I'm not sure what the best way to deal with this is - if, indeed, it
> needs any changes.
>
> Note that wrapping stuff up into vfs_inode_is_owned_by_me() isn't
> necessarily the most efficient as it means we may end up doing the uid
> idmapping an extra time - though mostly this is in places where I'm not
> sure it matters so much.
>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Etienne Champetier <champetier.etienne@gmail.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: Jeffrey Altman <jaltman@auristor.com>
> cc: Chet Ramey <chet.ramey@case.edu>
> cc: Cheyenne Wills <cwills@sinenomine.net>
> cc: Alexander Viro <viro@zeniv.linux.org.uk>
> cc: Christian Brauner <brauner@kernel.org>
> cc: Steve French <sfrench@samba.org>
> cc: Mimi Zohar <zohar@linux.ibm.com>
> cc: linux-afs@lists.infradead.org
> cc: openafs-devel@openafs.org
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> cc: linux-integrity@vger.kernel.org
> Link: https://groups.google.com/g/gnu.bash.bug/c/6PPTfOgFdL4/m/2AQU-S1N76UJ
> Link: https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=bash-5.3-rc1#n733
> ---
> fs/attr.c | 58 +++++++++++++++++++++++++++++-------------------
> fs/coredump.c | 3 +--
> fs/inode.c | 8 +++++--
> fs/locks.c | 7 ++++--
> fs/namei.c | 30 +++++++++++++------------
> fs/remap_range.c | 20 +++++++++--------
> 6 files changed, 74 insertions(+), 52 deletions(-)
>
> diff --git a/fs/attr.c b/fs/attr.c
> index 9caf63d20d03..fefd92af56a2 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -16,6 +16,7 @@
> #include <linux/fcntl.h>
> #include <linux/filelock.h>
> #include <linux/security.h>
> +#include "internal.h"
>
> /**
> * setattr_should_drop_sgid - determine whether the setgid bit needs to be
> @@ -91,19 +92,21 @@ EXPORT_SYMBOL(setattr_should_drop_suidgid);
> * permissions. On non-idmapped mounts or if permission checking is to be
> * performed on the raw inode simply pass @nop_mnt_idmap.
> */
> -static bool chown_ok(struct mnt_idmap *idmap,
> - const struct inode *inode, vfsuid_t ia_vfsuid)
> +static int chown_ok(struct mnt_idmap *idmap,
> + const struct inode *inode, vfsuid_t ia_vfsuid)
> {
> vfsuid_t vfsuid = i_uid_into_vfsuid(idmap, inode);
> - if (vfsuid_eq_kuid(vfsuid, current_fsuid()) &&
> - vfsuid_eq(ia_vfsuid, vfsuid))
> - return true;
> + int ret;
> +
> + ret = vfs_inode_is_owned_by_me(idmap, inode);
> + if (ret <= 0)
> + return ret;
> if (capable_wrt_inode_uidgid(idmap, inode, CAP_CHOWN))
> - return true;
> + return 0;
> if (!vfsuid_valid(vfsuid) &&
> ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
> - return true;
> - return false;
> + return 0;
> + return -EPERM;
> }
>
> /**
> @@ -118,23 +121,27 @@ static bool chown_ok(struct mnt_idmap *idmap,
> * permissions. On non-idmapped mounts or if permission checking is to be
> * performed on the raw inode simply pass @nop_mnt_idmap.
> */
> -static bool chgrp_ok(struct mnt_idmap *idmap,
> - const struct inode *inode, vfsgid_t ia_vfsgid)
> +static int chgrp_ok(struct mnt_idmap *idmap,
> + const struct inode *inode, vfsgid_t ia_vfsgid)
> {
> vfsgid_t vfsgid = i_gid_into_vfsgid(idmap, inode);
> - vfsuid_t vfsuid = i_uid_into_vfsuid(idmap, inode);
> - if (vfsuid_eq_kuid(vfsuid, current_fsuid())) {
> + int ret;
> +
> + ret = vfs_inode_is_owned_by_me(idmap, inode);
> + if (ret < 0)
> + return ret;
> + if (ret == 0) {
> if (vfsgid_eq(ia_vfsgid, vfsgid))
> - return true;
> + return 0;
> if (vfsgid_in_group_p(ia_vfsgid))
> - return true;
> + return 0;
> }
> if (capable_wrt_inode_uidgid(idmap, inode, CAP_CHOWN))
> - return true;
> + return 0;
> if (!vfsgid_valid(vfsgid) &&
> ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
> - return true;
> - return false;
> + return 0;
> + return -EPERM;
> }
>
> /**
> @@ -163,6 +170,7 @@ int setattr_prepare(struct mnt_idmap *idmap, struct dentry *dentry,
> {
> struct inode *inode = d_inode(dentry);
> unsigned int ia_valid = attr->ia_valid;
> + int ret;
>
> /*
> * First check size constraints. These can't be overriden using
> @@ -179,14 +187,18 @@ int setattr_prepare(struct mnt_idmap *idmap, struct dentry *dentry,
> goto kill_priv;
>
> /* Make sure a caller can chown. */
> - if ((ia_valid & ATTR_UID) &&
> - !chown_ok(idmap, inode, attr->ia_vfsuid))
> - return -EPERM;
> + if (ia_valid & ATTR_UID) {
> + ret = chown_ok(idmap, inode, attr->ia_vfsuid);
> + if (ret < 0)
> + return ret;
> + }
>
> /* Make sure caller can chgrp. */
> - if ((ia_valid & ATTR_GID) &&
> - !chgrp_ok(idmap, inode, attr->ia_vfsgid))
> - return -EPERM;
> + if (ia_valid & ATTR_GID) {
> + ret = chgrp_ok(idmap, inode, attr->ia_vfsgid);
> + if (ret < 0)
> + return ret;
> + }
>
> /* Make sure a caller can chmod. */
> if (ia_valid & ATTR_MODE) {
> diff --git a/fs/coredump.c b/fs/coredump.c
> index c33c177a701b..ded3936b2067 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -720,8 +720,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
> * filesystem.
> */
> idmap = file_mnt_idmap(cprm.file);
> - if (!vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode),
> - current_fsuid())) {
> + if (vfs_inode_is_owned_by_me(idmap, inode) != 0) {
> coredump_report_failure("Core dump to %s aborted: "
> "cannot preserve file owner", cn.corename);
> goto close_fail;
> diff --git a/fs/inode.c b/fs/inode.c
> index 99318b157a9a..7e91b6f87757 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2586,11 +2586,15 @@ bool inode_owner_or_capable(struct mnt_idmap *idmap,
> {
> vfsuid_t vfsuid;
> struct user_namespace *ns;
> + int ret;
>
> - vfsuid = i_uid_into_vfsuid(idmap, inode);
> - if (vfsuid_eq_kuid(vfsuid, current_fsuid()))
> + ret = vfs_inode_is_owned_by_me(idmap, inode);
> + if (ret == 0)
> return true;
> + if (ret < 0)
> + return false;
>
> + vfsuid = i_uid_into_vfsuid(idmap, inode);
> ns = current_user_ns();
> if (vfsuid_has_mapping(ns, vfsuid) && ns_capable(ns, CAP_FOWNER))
> return true;
> diff --git a/fs/locks.c b/fs/locks.c
> index 1619cddfa7a4..b7a2a3ab7529 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -68,6 +68,7 @@
> #include <trace/events/filelock.h>
>
> #include <linux/uaccess.h>
> +#include "internal.h"
>
> static struct file_lock *file_lock(struct file_lock_core *flc)
> {
> @@ -2013,10 +2014,12 @@ int
> vfs_setlease(struct file *filp, int arg, struct file_lease **lease, void **priv)
> {
> struct inode *inode = file_inode(filp);
> - vfsuid_t vfsuid = i_uid_into_vfsuid(file_mnt_idmap(filp), inode);
> int error;
>
> - if ((!vfsuid_eq_kuid(vfsuid, current_fsuid())) && !capable(CAP_LEASE))
> + error = vfs_inode_is_owned_by_me(file_mnt_idmap(filp), inode);
> + if (error < 0)
> + return error;
> + if (error != 0 && !capable(CAP_LEASE))
> return -EACCES;
> if (!S_ISREG(inode->i_mode))
> return -EINVAL;
> diff --git a/fs/namei.c b/fs/namei.c
> index 9f42dc46322f..6ede952d424a 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1195,26 +1195,26 @@ static int vfs_inodes_have_same_owner(struct mnt_idmap *idmap, struct inode *ino
> *
> * Returns 0 if following the symlink is allowed, -ve on error.
> */
> -static inline int may_follow_link(struct nameidata *nd, const struct inode *inode)
> +static inline int may_follow_link(struct nameidata *nd, struct inode *inode)
> {
> struct mnt_idmap *idmap;
> - vfsuid_t vfsuid;
> + int ret;
>
> if (!sysctl_protected_symlinks)
> return 0;
>
> - idmap = mnt_idmap(nd->path.mnt);
> - vfsuid = i_uid_into_vfsuid(idmap, inode);
> - /* Allowed if owner and follower match. */
> - if (vfsuid_eq_kuid(vfsuid, current_fsuid()))
> - return 0;
> -
> /* Allowed if parent directory not sticky and world-writable. */
> if ((nd->dir_mode & (S_ISVTX|S_IWOTH)) != (S_ISVTX|S_IWOTH))
> return 0;
>
> + idmap = mnt_idmap(nd->path.mnt);
> + /* Allowed if owner and follower match. */
> + ret = vfs_inode_is_owned_by_me(idmap, inode);
> + if (ret <= 0)
> + return ret;
> +
> /* Allowed if parent directory and link owner match. */
> - if (vfsuid_valid(nd->dir_vfsuid) && vfsuid_eq(nd->dir_vfsuid, vfsuid))
> + if (vfs_inodes_have_same_owner(idmap, inode, nd))
> return 0;
>
> if (nd->flags & LOOKUP_RCU)
> @@ -3157,12 +3157,14 @@ EXPORT_SYMBOL(user_path_at);
> int __check_sticky(struct mnt_idmap *idmap, struct inode *dir,
> struct inode *inode)
> {
> - kuid_t fsuid = current_fsuid();
> + int ret;
>
> - if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), fsuid))
> - return 0;
> - if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, dir), fsuid))
> - return 0;
> + ret = vfs_inode_is_owned_by_me(idmap, inode);
> + if (ret <= 0)
> + return ret;
> + ret = vfs_inode_is_owned_by_me(idmap, dir);
> + if (ret <= 0)
> + return ret;
> return !capable_wrt_inode_uidgid(idmap, inode, CAP_FOWNER);
> }
> EXPORT_SYMBOL(__check_sticky);
> diff --git a/fs/remap_range.c b/fs/remap_range.c
> index 26afbbbfb10c..9eee93c27001 100644
> --- a/fs/remap_range.c
> +++ b/fs/remap_range.c
> @@ -413,20 +413,22 @@ loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> EXPORT_SYMBOL(vfs_clone_file_range);
>
> /* Check whether we are allowed to dedupe the destination file */
> -static bool may_dedupe_file(struct file *file)
> +static int may_dedupe_file(struct file *file)
> {
> struct mnt_idmap *idmap = file_mnt_idmap(file);
> struct inode *inode = file_inode(file);
> + int ret;
>
> if (capable(CAP_SYS_ADMIN))
> - return true;
> + return 0;
> if (file->f_mode & FMODE_WRITE)
> - return true;
> - if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), current_fsuid()))
> - return true;
> + return 0;
> + ret = vfs_inode_is_owned_by_me(idmap, inode);
> + if (ret <= 0)
> + return ret;
> if (!inode_permission(idmap, inode, MAY_WRITE))
> - return true;
> - return false;
> + return 0;
> + return -EPERM;
> }
>
> loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
> @@ -459,8 +461,8 @@ loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
> if (ret)
> return ret;
>
> - ret = -EPERM;
> - if (!may_dedupe_file(dst_file))
> + ret = may_dedupe_file(dst_file);
> + if (ret < 0)
> goto out_drop_write;
>
> ret = -EXDEV;
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
This change looks good to me. It should be noted that it assumes that
filesystem specific is_owned_by_me inode_operation can properly handle
all inode types. The preceding change will need a fix for the afs
implementation.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4276 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir
2025-05-30 0:11 ` Jeffrey E Altman
@ 2025-05-30 2:42 ` Jeffrey E Altman
0 siblings, 0 replies; 9+ messages in thread
From: Jeffrey E Altman @ 2025-05-30 2:42 UTC (permalink / raw)
To: David Howells, Christian Brauner
Cc: Marc Dionne, linux-afs, linux-fsdevel, linux-kernel,
Etienne Champetier, Chet Ramey, Cheyenne Wills, Alexander Viro,
Christian Brauner, Steve French, openafs-devel, linux-cifs
[-- Attachment #1: Type: text/plain, Size: 2912 bytes --]
On 5/29/2025 8:11 PM, Jeffrey E Altman wrote:
>
> On 5/19/2025 12:11 PM, David Howells wrote:
>> +int afs_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode)
>> +{
>> + struct afs_vnode *vnode = AFS_FS_I(inode);
>> + afs_access_t access;
>> + struct key *key;
>> + int ret;
>> +
>> + key = afs_request_key(vnode->volume->cell);
>> + if (IS_ERR(key))
>> + return PTR_ERR(key);
>> +
>> + /* Get the access rights for the key on this file. */
>> + ret = afs_check_permit(vnode, key, &access);
>> + if (ret < 0)
>> + goto error;
>> +
>> + /* We get the ADMINISTER bit if we own the file. */
>> + ret = (access & AFS_ACE_ADMINISTER) ? 0 : 1;
>
> AFS_ACE_ADMINISTER only means ownership if the inode is a
> non-directory. Should
> we add an explicit check for inode type?
I think the answer is 'yes'. The above check does not work for
directory inodes so
we need another method of answering the question or informing the caller
that the
answer is unknown. In some cases, such as chown() and chgrp() its safe
to say the
caller is the owner and defer the eventual access check to the server
when the inode
is a directory. However, there are other cases such as check_sticky()
where the vfs
cannot defer the decision to the server because there isn't an RPC that
is a one-to-one
match for the decision being made.
In addition to ownership, operations such as chown and chgrp or chmod might
be permitted if the caller is a member of the system:administrators
group. The
server has no method of informing the client that the user is special or
what rights
are granted due to that status.
In the absence of explicit knowledge, the only method by which the
client can
answer the question today would be to issue a PR_WhoAmI RPC to the cell's
Protection Service to obtain the AFS ID of the task's token. The AFS ID
could then
be compared to the struct afs_vnode->status.owner field. However,
clients might
not be permitted to communicate with the Protection Service and PR_WhoAmI is
currently only available from the AuriStorFS Protection Service.
PR_WhoAmI is
standardized for implementation by OpenAFS but at present neither OpenAFS
1.8.x nor 1.9.x include an implementation.
It should be noted that although an AFS ID is provided to the afs client
when an
rxkad token is inserted into the kernel, the provided value cannot be
trusted and
must not be used for this purpose.
>
>> +error:
>> + key_put(key);
>> + return ret;
>> +}
To address the most urgent need which is the may_create_in_sticky()
call, perhaps
introduce a non-directory specific is_file_inode_owned_by_me() as part
of this
change and then address the directory case once we figure out how to
implement
it.
Jeffrey Altman
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4276 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] vfs: Fix inode ownership checks with regard to foreign ownership
2025-05-30 1:03 ` Jeffrey E Altman
@ 2025-07-21 16:15 ` Jeffrey Altman
0 siblings, 0 replies; 9+ messages in thread
From: Jeffrey Altman @ 2025-07-21 16:15 UTC (permalink / raw)
To: David Howells, Christian Brauner
Cc: Marc Dionne, linux-afs, linux-fsdevel, linux-kernel,
Etienne Champetier, Chet Ramey, Cheyenne Wills, Alexander Viro,
Christian Brauner, Steve French, Mimi Zohar, openafs-devel,
linux-cifs, linux-integrity
[-- Attachment #1: Type: text/plain, Size: 15191 bytes --]
> On May 29, 2025, at 9:03 PM, Jeffrey E Altman <jaltman@auristor.com> wrote:
>
> On 5/19/2025 12:11 PM, David Howells wrote:
>
>> Fix a number of ownership checks made by the VFS that assume that
>> inode->i_uid is meaningful with respect to the UID space of the system
>> performing the check. Network filesystems, however, may violate this
>> assumption - and, indeed, a network filesystem may not even have an actual
>> concept of a UNIX integer UID (cifs, for example).
>>
>> There are a number of places within the VFS where UID checks are made and
>> some of these should be deferring the interpretation to the filesystem by
>> way of the previously added vfs_inode_is_owned_by_me() and
>> vfs_inodes_have_same_owner():
>>
>> (*) chown_ok()
>> (*) chgrp_ok()
>>
>> These should call vfs_inode_is_owned_by_me(). Possibly these need to
>> defer all their checks to the network filesystem as the interpretation
>> of the new UID/GID depends on the netfs too, but the ->setattr()
>> method gets a chance to deal with that.
>>
>> (*) do_coredump()
>>
>> Should probably call vfs_is_owned_by_me() to check that the file
>> created is owned by the caller - but the check that's there might be
>> sufficient.
>>
>> (*) inode_owner_or_capable()
>>
>> Should call vfs_is_owned_by_me(). I'm not sure whether the namespace
>> mapping makes sense in such a case, but it probably could be used.
>>
>> (*) vfs_setlease()
>>
>> Should call vfs_is_owned_by_me(). Actually, it should query if
>> leasing is permitted.
>>
>> Also, setting locks could perhaps do with a permission call to the
>> filesystem driver as AFS, for example, has a lock permission bit in
>> the ACL, but since the AFS server checks that when the RPC call is
>> made, it's probably unnecessary.
>>
>> (*) acl_permission_check()
>> (*) posix_acl_permission()
>>
>> These functions are only used by generic_permission() which is
>> overridden if ->permission() is supplied, and when evaluating a POSIX
>> ACL, it should arguably be checking the UID anyway.
>>
>> AFS, for example, implements its own ACLs and evaluates them in
>> ->permission() and on the server.
>>
>> (*) may_follow_link()
>>
>> Should call vfs_is_owned_by_me() and also vfs_have_same_owner() on the
>> the link and its parent dir. The latter only applies on
>> world-writable sticky dirs.
>>
>> (*) may_create_in_sticky()
>>
>> The initial subject of this patch. Should call vfs_is_owned_by_me()
>> and also vfs_have_same_owner() both.
>>
>> (*) __check_sticky()
>>
>> Should call vfs_is_owned_by_me() on both the dir and the inode.
>>
>> (*) may_dedupe_file()
>>
>> Should call vfs_is_owned_by_me().
>>
>> (*) IMA policy ops.
>>
>> I'm not sure what the best way to deal with this is - if, indeed, it
>> needs any changes.
>>
>> Note that wrapping stuff up into vfs_inode_is_owned_by_me() isn't
>> necessarily the most efficient as it means we may end up doing the uid
>> idmapping an extra time - though mostly this is in places where I'm not
>> sure it matters so much.
>>
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: Etienne Champetier <champetier.etienne@gmail.com>
>> cc: Marc Dionne <marc.dionne@auristor.com>
>> cc: Jeffrey Altman <jaltman@auristor.com>
>> cc: Chet Ramey <chet.ramey@case.edu>
>> cc: Cheyenne Wills <cwills@sinenomine.net>
>> cc: Alexander Viro <viro@zeniv.linux.org.uk>
>> cc: Christian Brauner <brauner@kernel.org>
>> cc: Steve French <sfrench@samba.org>
>> cc: Mimi Zohar <zohar@linux.ibm.com>
>> cc: linux-afs@lists.infradead.org
>> cc: openafs-devel@openafs.org
>> cc: linux-cifs@vger.kernel.org
>> cc: linux-fsdevel@vger.kernel.org
>> cc: linux-integrity@vger.kernel.org
>> Link: https://groups.google.com/g/gnu.bash.bug/c/6PPTfOgFdL4/m/2AQU-S1N76UJ
>> Link: https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=bash-5.3-rc1#n733
>> ---
>> fs/attr.c | 58 +++++++++++++++++++++++++++++-------------------
>> fs/coredump.c | 3 +--
>> fs/inode.c | 8 +++++--
>> fs/locks.c | 7 ++++--
>> fs/namei.c | 30 +++++++++++++------------
>> fs/remap_range.c | 20 +++++++++--------
>> 6 files changed, 74 insertions(+), 52 deletions(-)
>>
>> diff --git a/fs/attr.c b/fs/attr.c
>> index 9caf63d20d03..fefd92af56a2 100644
>> --- a/fs/attr.c
>> +++ b/fs/attr.c
>> @@ -16,6 +16,7 @@
>> #include <linux/fcntl.h>
>> #include <linux/filelock.h>
>> #include <linux/security.h>
>> +#include "internal.h"
>> /**
>> * setattr_should_drop_sgid - determine whether the setgid bit needs to be
>> @@ -91,19 +92,21 @@ EXPORT_SYMBOL(setattr_should_drop_suidgid);
>> * permissions. On non-idmapped mounts or if permission checking is to be
>> * performed on the raw inode simply pass @nop_mnt_idmap.
>> */
>> -static bool chown_ok(struct mnt_idmap *idmap,
>> - const struct inode *inode, vfsuid_t ia_vfsuid)
>> +static int chown_ok(struct mnt_idmap *idmap,
>> + const struct inode *inode, vfsuid_t ia_vfsuid)
>> {
>> vfsuid_t vfsuid = i_uid_into_vfsuid(idmap, inode);
>> - if (vfsuid_eq_kuid(vfsuid, current_fsuid()) &&
>> - vfsuid_eq(ia_vfsuid, vfsuid))
>> - return true;
>> + int ret;
>> +
>> + ret = vfs_inode_is_owned_by_me(idmap, inode);
>> + if (ret <= 0)
>> + return ret;
>> if (capable_wrt_inode_uidgid(idmap, inode, CAP_CHOWN))
>> - return true;
>> + return 0;
>> if (!vfsuid_valid(vfsuid) &&
>> ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
>> - return true;
>> - return false;
>> + return 0;
>> + return -EPERM;
>> }
>> /**
>> @@ -118,23 +121,27 @@ static bool chown_ok(struct mnt_idmap *idmap,
>> * permissions. On non-idmapped mounts or if permission checking is to be
>> * performed on the raw inode simply pass @nop_mnt_idmap.
>> */
>> -static bool chgrp_ok(struct mnt_idmap *idmap,
>> - const struct inode *inode, vfsgid_t ia_vfsgid)
>> +static int chgrp_ok(struct mnt_idmap *idmap,
>> + const struct inode *inode, vfsgid_t ia_vfsgid)
>> {
>> vfsgid_t vfsgid = i_gid_into_vfsgid(idmap, inode);
>> - vfsuid_t vfsuid = i_uid_into_vfsuid(idmap, inode);
>> - if (vfsuid_eq_kuid(vfsuid, current_fsuid())) {
>> + int ret;
>> +
>> + ret = vfs_inode_is_owned_by_me(idmap, inode);
>> + if (ret < 0)
>> + return ret;
>> + if (ret == 0) {
>> if (vfsgid_eq(ia_vfsgid, vfsgid))
>> - return true;
>> + return 0;
>> if (vfsgid_in_group_p(ia_vfsgid))
>> - return true;
>> + return 0;
>> }
>> if (capable_wrt_inode_uidgid(idmap, inode, CAP_CHOWN))
>> - return true;
>> + return 0;
>> if (!vfsgid_valid(vfsgid) &&
>> ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
>> - return true;
>> - return false;
>> + return 0;
>> + return -EPERM;
>> }
>> /**
>> @@ -163,6 +170,7 @@ int setattr_prepare(struct mnt_idmap *idmap, struct dentry *dentry,
>> {
>> struct inode *inode = d_inode(dentry);
>> unsigned int ia_valid = attr->ia_valid;
>> + int ret;
>> /*
>> * First check size constraints. These can't be overriden using
>> @@ -179,14 +187,18 @@ int setattr_prepare(struct mnt_idmap *idmap, struct dentry *dentry,
>> goto kill_priv;
>> /* Make sure a caller can chown. */
>> - if ((ia_valid & ATTR_UID) &&
>> - !chown_ok(idmap, inode, attr->ia_vfsuid))
>> - return -EPERM;
>> + if (ia_valid & ATTR_UID) {
>> + ret = chown_ok(idmap, inode, attr->ia_vfsuid);
>> + if (ret < 0)
>> + return ret;
>> + }
>> /* Make sure caller can chgrp. */
>> - if ((ia_valid & ATTR_GID) &&
>> - !chgrp_ok(idmap, inode, attr->ia_vfsgid))
>> - return -EPERM;
>> + if (ia_valid & ATTR_GID) {
>> + ret = chgrp_ok(idmap, inode, attr->ia_vfsgid);
>> + if (ret < 0)
>> + return ret;
>> + }
>> /* Make sure a caller can chmod. */
>> if (ia_valid & ATTR_MODE) {
>> diff --git a/fs/coredump.c b/fs/coredump.c
>> index c33c177a701b..ded3936b2067 100644
>> --- a/fs/coredump.c
>> +++ b/fs/coredump.c
>> @@ -720,8 +720,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
>> * filesystem.
>> */
>> idmap = file_mnt_idmap(cprm.file);
>> - if (!vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode),
>> - current_fsuid())) {
>> + if (vfs_inode_is_owned_by_me(idmap, inode) != 0) {
>> coredump_report_failure("Core dump to %s aborted: "
>> "cannot preserve file owner", cn.corename);
>> goto close_fail;
>> diff --git a/fs/inode.c b/fs/inode.c
>> index 99318b157a9a..7e91b6f87757 100644
>> --- a/fs/inode.c
>> +++ b/fs/inode.c
>> @@ -2586,11 +2586,15 @@ bool inode_owner_or_capable(struct mnt_idmap *idmap,
>> {
>> vfsuid_t vfsuid;
>> struct user_namespace *ns;
>> + int ret;
>> - vfsuid = i_uid_into_vfsuid(idmap, inode);
>> - if (vfsuid_eq_kuid(vfsuid, current_fsuid()))
>> + ret = vfs_inode_is_owned_by_me(idmap, inode);
>> + if (ret == 0)
>> return true;
>> + if (ret < 0)
>> + return false;
>> + vfsuid = i_uid_into_vfsuid(idmap, inode);
>> ns = current_user_ns();
>> if (vfsuid_has_mapping(ns, vfsuid) && ns_capable(ns, CAP_FOWNER))
>> return true;
>> diff --git a/fs/locks.c b/fs/locks.c
>> index 1619cddfa7a4..b7a2a3ab7529 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -68,6 +68,7 @@
>> #include <trace/events/filelock.h>
>> #include <linux/uaccess.h>
>> +#include "internal.h"
>> static struct file_lock *file_lock(struct file_lock_core *flc)
>> {
>> @@ -2013,10 +2014,12 @@ int
>> vfs_setlease(struct file *filp, int arg, struct file_lease **lease, void **priv)
>> {
>> struct inode *inode = file_inode(filp);
>> - vfsuid_t vfsuid = i_uid_into_vfsuid(file_mnt_idmap(filp), inode);
>> int error;
>> - if ((!vfsuid_eq_kuid(vfsuid, current_fsuid())) && !capable(CAP_LEASE))
>> + error = vfs_inode_is_owned_by_me(file_mnt_idmap(filp), inode);
>> + if (error < 0)
>> + return error;
>> + if (error != 0 && !capable(CAP_LEASE))
>> return -EACCES;
>> if (!S_ISREG(inode->i_mode))
>> return -EINVAL;
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 9f42dc46322f..6ede952d424a 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -1195,26 +1195,26 @@ static int vfs_inodes_have_same_owner(struct mnt_idmap *idmap, struct inode *ino
>> *
>> * Returns 0 if following the symlink is allowed, -ve on error.
>> */
>> -static inline int may_follow_link(struct nameidata *nd, const struct inode *inode)
>> +static inline int may_follow_link(struct nameidata *nd, struct inode *inode)
>> {
>> struct mnt_idmap *idmap;
>> - vfsuid_t vfsuid;
>> + int ret;
>> if (!sysctl_protected_symlinks)
>> return 0;
>> - idmap = mnt_idmap(nd->path.mnt);
>> - vfsuid = i_uid_into_vfsuid(idmap, inode);
>> - /* Allowed if owner and follower match. */
>> - if (vfsuid_eq_kuid(vfsuid, current_fsuid()))
>> - return 0;
>> -
>> /* Allowed if parent directory not sticky and world-writable. */
>> if ((nd->dir_mode & (S_ISVTX|S_IWOTH)) != (S_ISVTX|S_IWOTH))
>> return 0;
>> + idmap = mnt_idmap(nd->path.mnt);
>> + /* Allowed if owner and follower match. */
>> + ret = vfs_inode_is_owned_by_me(idmap, inode);
>> + if (ret <= 0)
>> + return ret;
>> +
>> /* Allowed if parent directory and link owner match. */
>> - if (vfsuid_valid(nd->dir_vfsuid) && vfsuid_eq(nd->dir_vfsuid, vfsuid))
>> + if (vfs_inodes_have_same_owner(idmap, inode, nd))
>> return 0;
>> if (nd->flags & LOOKUP_RCU)
>> @@ -3157,12 +3157,14 @@ EXPORT_SYMBOL(user_path_at);
>> int __check_sticky(struct mnt_idmap *idmap, struct inode *dir,
>> struct inode *inode)
>> {
>> - kuid_t fsuid = current_fsuid();
>> + int ret;
>> - if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), fsuid))
>> - return 0;
>> - if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, dir), fsuid))
>> - return 0;
>> + ret = vfs_inode_is_owned_by_me(idmap, inode);
>> + if (ret <= 0)
>> + return ret;
>> + ret = vfs_inode_is_owned_by_me(idmap, dir);
>> + if (ret <= 0)
>> + return ret;
>> return !capable_wrt_inode_uidgid(idmap, inode, CAP_FOWNER);
>> }
>> EXPORT_SYMBOL(__check_sticky);
>> diff --git a/fs/remap_range.c b/fs/remap_range.c
>> index 26afbbbfb10c..9eee93c27001 100644
>> --- a/fs/remap_range.c
>> +++ b/fs/remap_range.c
>> @@ -413,20 +413,22 @@ loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
>> EXPORT_SYMBOL(vfs_clone_file_range);
>> /* Check whether we are allowed to dedupe the destination file */
>> -static bool may_dedupe_file(struct file *file)
>> +static int may_dedupe_file(struct file *file)
>> {
>> struct mnt_idmap *idmap = file_mnt_idmap(file);
>> struct inode *inode = file_inode(file);
>> + int ret;
>> if (capable(CAP_SYS_ADMIN))
>> - return true;
>> + return 0;
>> if (file->f_mode & FMODE_WRITE)
>> - return true;
>> - if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), current_fsuid()))
>> - return true;
>> + return 0;
>> + ret = vfs_inode_is_owned_by_me(idmap, inode);
>> + if (ret <= 0)
>> + return ret;
>> if (!inode_permission(idmap, inode, MAY_WRITE))
>> - return true;
>> - return false;
>> + return 0;
>> + return -EPERM;
>> }
>> loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
>> @@ -459,8 +461,8 @@ loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
>> if (ret)
>> return ret;
>> - ret = -EPERM;
>> - if (!may_dedupe_file(dst_file))
>> + ret = may_dedupe_file(dst_file);
>> + if (ret < 0)
>> goto out_drop_write;
>> ret = -EXDEV;
>
> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
>
> This change looks good to me. It should be noted that it assumes that filesystem specific is_owned_by_me inode_operation can properly handle all inode types. The preceding change will need a fix for the afs implementation.
Some further thoughts on this series:
1. Can vfs_inode_is_owned_by_me() be restricted to regular files and symlinks? AFS cannot provide an answer via the PRSFS_ADMINISTER flag for directories and with the exception of __check_sticky() it is never called on a directory inode.
2. __check_sticky() can be changed from calling is_owned_by_me() on both the file and the directory to using is_owned_by_me() for the file and then use a have_same_owner() check to compare the file and directory inodes. Doing so avoids the only vfs_inode_is_owned_by_me() call on a directory inode.
3. Can we add a new vfs_is chown_ok() inode operation? AFS does not use vnode ownership as the determining factor when permitting alteration of the unix owner/group. Only members of the AFS cell’s system:administrators group are permitted to change ownership and the client has no visibility into group memberships so must either always assume its ok to attempt the ownership change and let the fileserver reject it, or the AFS fileserver needs to implement a new rpc or caller access bit to determine if the caller is a system:administrator or otherwise has permission to change object ownership.
Thanks.
Jeffrey Altman
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 3929 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS
@ 2025-07-23 15:26 David Howells
0 siblings, 0 replies; 9+ messages in thread
From: David Howells @ 2025-07-23 15:26 UTC (permalink / raw)
To: Al Viro, Christian Brauner
Cc: David Howells, Marc Dionne, Jeffrey Altman, Steve French,
linux-afs, openafs-devel, linux-cifs, linux-nfs, linux-fsdevel,
linux-kernel
Hi Al, Christian,
Here's a pair of fixes that deal with some places the VFS mishandles
foreign user ID checks. By "foreign" I mean that the user IDs from the
filesystem do not belong in the same number space as the system's user IDs.
Network filesystems are prime examples of this, but it may also impact
things like USB drives or cdroms.
Take AFS as example: Whilst each file does have a numeric user ID, the file
may be accessed from a world-accessible public-facing server from some
other organisation with its own idea of what that user ID refers to. IDs
from AFS may also collide with the system's own set of IDs and may also be
unrepresentable as a 32-bit UID (in the case of AuriStor servers).
Further, kAFS uses a key containing an authentication token to specify the
subject doing an RPC operation to the server - and, as such, this needs to
be used instead of current_fsuid() in determining whether the current user
has ownership rights over a file.
Additionally, filesystems (CIFS being a notable example) may also have user
identifiers that aren't simple integers.
Now the problem in the VFS is that there are a number of places where it
assumes it can directly compare i_uid (possibly id-mapped) to either than
on another inode or a UID drawn from elsewhere (e.g. current_uid()) - but
this doesn't work right.
This causes the write-to-sticky check to work incorrectly for AFS (though
this is currently masked by a workaround in bash that is slated to be
removed) whereby open(O_CREAT) of such a file will fail when it shouldn't.
Two patches are provided:
(1) Add a pair of inode operations, one to compare the ownership of a pair
of inodes and the other to see if the current process has ownership
rights over an inode. Usage of this is then extended out into the
VFS, replacing comparisons between i_uid and i_uid and between i_uid
and current_fsuid(). The default, it the inode ops are unimplemented,
is to do those direct i_uid comparisons.
(2) Fixes the bash workaround issue with regard to AFS, overriding the
checks as to whether two inodes have the same owner and the check as
to whether the current user owns an inode to work within the AFS
model.
kAFS uses the result of a status-fetch with a suitable key to determine
file ownership (if the ADMINISTER bit is set) and just compares the 64-bit
owner IDs to determine if two inodes have the same ownership.
Note that chown may also need modifying in some way - but that can't
necessarily supply the information required (for instance, an AuriStor YFS ID
is 64 bits, but chown can only handle a 32-bit integer; CIFS might use a
GUID).
The patches can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=afs-sticky-2
Thanks,
David
David Howells (2):
vfs: Allow filesystems with foreign owner IDs to override UID checks
afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir
Documentation/filesystems/vfs.rst | 23 ++++-
fs/afs/dir.c | 2 +
fs/afs/file.c | 2 +
fs/afs/internal.h | 3 +
fs/afs/security.c | 46 +++++++++
fs/attr.c | 58 ++++++-----
fs/coredump.c | 3 +-
fs/inode.c | 11 +-
fs/internal.h | 1 +
fs/locks.c | 7 +-
fs/namei.c | 161 ++++++++++++++++++++++++------
fs/remap_range.c | 20 ++--
include/linux/fs.h | 6 +-
13 files changed, 270 insertions(+), 73 deletions(-)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS
@ 2025-09-03 12:01 David Howells
0 siblings, 0 replies; 9+ messages in thread
From: David Howells @ 2025-09-03 12:01 UTC (permalink / raw)
To: Al Viro, Christian Brauner
Cc: David Howells, Marc Dionne, Jeffrey Altman, Steve French,
linux-afs, openafs-devel, linux-cifs, linux-nfs, linux-fsdevel,
linux-kernel
Hi Al, Christian,
Here's a pair of fixes that deal with some places the VFS mishandles
foreign user ID checks. By "foreign" I mean that the user IDs from the
filesystem do not belong in the same number space as the system's user IDs.
Network filesystems are prime examples of this, but it may also impact
things like USB drives or cdroms.
Take AFS as example: Whilst each file does have a numeric user ID, the file
may be accessed from a world-accessible public-facing server from some
other organisation with its own idea of what that user ID refers to. IDs
from AFS may also collide with the system's own set of IDs and may also be
unrepresentable as a 32-bit UID (in the case of AuriStor servers).
Further, kAFS uses a key containing an authentication token to specify the
subject doing an RPC operation to the server - and, as such, this needs to
be used instead of current_fsuid() in determining whether the current user
has ownership rights over a file.
Additionally, filesystems (CIFS being a notable example) may also have user
identifiers that aren't simple integers.
Now the problem in the VFS is that there are a number of places where it
assumes it can directly compare i_uid (possibly id-mapped) to either than
on another inode or a UID drawn from elsewhere (e.g. current_uid()) - but
this doesn't work right.
This causes the write-to-sticky check to work incorrectly for AFS (though
this is currently masked by a workaround in bash that is slated to be
removed) whereby open(O_CREAT) of such a file will fail when it shouldn't.
Two patches are provided:
(1) Add a pair of inode operations, one to compare the ownership of a pair
of inodes and the other to see if the current process has ownership
rights over an inode. Usage of this is then extended out into the
VFS, replacing comparisons between i_uid and i_uid and between i_uid
and current_fsuid(). The default, it the inode ops are unimplemented,
is to do those direct i_uid comparisons.
(2) Fixes the bash workaround issue with regard to AFS, overriding the
checks as to whether two inodes have the same owner and the check as
to whether the current user owns an inode to work within the AFS
model.
kAFS uses the result of a status-fetch with a suitable key to determine
file ownership (if the ADMINISTER bit is set) and just compares the 64-bit
owner IDs to determine if two inodes have the same ownership.
Note that chown may also need modifying in some way - but that can't
necessarily supply the information required (for instance, an AuriStor YFS ID
is 64 bits, but chown can only handle a 32-bit integer; CIFS might use a
GUID).
The patches can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=afs-sticky-2
Thanks,
David
David Howells (2):
vfs: Allow filesystems with foreign owner IDs to override UID checks
afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir
Documentation/filesystems/vfs.rst | 21 ++++
fs/afs/dir.c | 2 +
fs/afs/file.c | 2 +
fs/afs/internal.h | 3 +
fs/afs/security.c | 46 +++++++++
fs/attr.c | 58 ++++++-----
fs/coredump.c | 2 +-
fs/inode.c | 11 +-
fs/internal.h | 1 +
fs/locks.c | 7 +-
fs/namei.c | 161 ++++++++++++++++++++++++------
fs/remap_range.c | 20 ++--
include/linux/fs.h | 6 +-
13 files changed, 269 insertions(+), 71 deletions(-)
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-09-03 12:02 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-19 16:11 [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS David Howells
2025-05-19 16:11 ` [PATCH 1/2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir David Howells
2025-05-30 0:11 ` Jeffrey E Altman
2025-05-30 2:42 ` Jeffrey E Altman
2025-05-19 16:11 ` [PATCH 2/2] vfs: Fix inode ownership checks with regard to foreign ownership David Howells
2025-05-30 1:03 ` Jeffrey E Altman
2025-07-21 16:15 ` Jeffrey Altman
-- strict thread matches above, loose matches on Subject: below --
2025-07-23 15:26 [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS David Howells
2025-09-03 12:01 David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).