* [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS
@ 2025-05-19 16:11 David Howells
0 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2025-05-19 16:11 UTC (permalink / raw)
To: Christian Brauner
Cc: David Howells, Marc Dionne, linux-afs, linux-fsdevel,
linux-kernel
Hi Christian,
Here's a pair of fixes that deal with some places the VFS mishandles
foreign user ID checks. By "foreign" I mean that the user IDs from the
filesystem do not belong in the same number space as the system's user IDs.
Network filesystems are prime examples of this, but it may also impact
things like USB drives or cdroms.
Take AFS as example: Whilst each file does have a numeric user ID, the file
may be accessed from a world-accessible public-facing server from some
other organisation with its own idea of what that user ID refers to. IDs
from AFS may also collide with the system's own set of IDs and may also be
unrepresentable as a 32-bit UID (in the case of AuriStor servers).
Further, kAFS uses a key containing an authentication token to specify the
subject doing an RPC operation to the server - and, as such, this needs to
be used instead of current_fsuid() in determining whether the current user
has ownership rights over a file.
Additionally, filesystems (CIFS being a notable example) may also have user
identifiers that aren't simple integers.
Now the problem in the VFS is that there are a number of places where it
assumes it can directly compare i_uid (possibly id-mapped) to either than
on another inode or a UID drawn from elsewhere (e.g. current_uid()) - but
this doesn't work right.
This causes the write-to-sticky check to work incorrectly for AFS (though
this is currently masked by a workaround in bash that is slated to be
removed) whereby open(O_CREAT) of such a file will fail when it shouldn't.
Two patches are provided: The first specifically fixes the bash workaround
issue, delegating the check as to whether two inodes have the same owner
and the check as to whether the current user owns an inode to the
filesystem.
AFS then uses the result of a status-fetch with a suitable key to determine
file ownership and just compares the 64-bit owner IDs to determine if two
inodes have the same ownership.
The second patch expands the use of the VFS helper functions added by the
first to other VFS UID checks.
The patches can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=afs-fixes
Thanks,
David
David Howells (2):
afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir
vfs: Fix inode ownership checks with regard to foreign ownership
fs/afs/dir.c | 2 ++
fs/afs/file.c | 2 ++
fs/afs/internal.h | 3 ++
fs/afs/security.c | 52 ++++++++++++++++++++++++++++++
fs/attr.c | 58 ++++++++++++++++++++-------------
fs/coredump.c | 3 +-
fs/inode.c | 8 +++--
fs/internal.h | 1 +
fs/locks.c | 7 ++--
fs/namei.c | 80 ++++++++++++++++++++++++++++++++--------------
fs/remap_range.c | 20 ++++++------
include/linux/fs.h | 3 ++
12 files changed, 177 insertions(+), 62 deletions(-)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS
@ 2025-07-23 15:26 David Howells
0 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2025-07-23 15:26 UTC (permalink / raw)
To: Al Viro, Christian Brauner
Cc: David Howells, Marc Dionne, Jeffrey Altman, Steve French,
linux-afs, openafs-devel, linux-cifs, linux-nfs, linux-fsdevel,
linux-kernel
Hi Al, Christian,
Here's a pair of fixes that deal with some places the VFS mishandles
foreign user ID checks. By "foreign" I mean that the user IDs from the
filesystem do not belong in the same number space as the system's user IDs.
Network filesystems are prime examples of this, but it may also impact
things like USB drives or cdroms.
Take AFS as example: Whilst each file does have a numeric user ID, the file
may be accessed from a world-accessible public-facing server from some
other organisation with its own idea of what that user ID refers to. IDs
from AFS may also collide with the system's own set of IDs and may also be
unrepresentable as a 32-bit UID (in the case of AuriStor servers).
Further, kAFS uses a key containing an authentication token to specify the
subject doing an RPC operation to the server - and, as such, this needs to
be used instead of current_fsuid() in determining whether the current user
has ownership rights over a file.
Additionally, filesystems (CIFS being a notable example) may also have user
identifiers that aren't simple integers.
Now the problem in the VFS is that there are a number of places where it
assumes it can directly compare i_uid (possibly id-mapped) to either than
on another inode or a UID drawn from elsewhere (e.g. current_uid()) - but
this doesn't work right.
This causes the write-to-sticky check to work incorrectly for AFS (though
this is currently masked by a workaround in bash that is slated to be
removed) whereby open(O_CREAT) of such a file will fail when it shouldn't.
Two patches are provided:
(1) Add a pair of inode operations, one to compare the ownership of a pair
of inodes and the other to see if the current process has ownership
rights over an inode. Usage of this is then extended out into the
VFS, replacing comparisons between i_uid and i_uid and between i_uid
and current_fsuid(). The default, it the inode ops are unimplemented,
is to do those direct i_uid comparisons.
(2) Fixes the bash workaround issue with regard to AFS, overriding the
checks as to whether two inodes have the same owner and the check as
to whether the current user owns an inode to work within the AFS
model.
kAFS uses the result of a status-fetch with a suitable key to determine
file ownership (if the ADMINISTER bit is set) and just compares the 64-bit
owner IDs to determine if two inodes have the same ownership.
Note that chown may also need modifying in some way - but that can't
necessarily supply the information required (for instance, an AuriStor YFS ID
is 64 bits, but chown can only handle a 32-bit integer; CIFS might use a
GUID).
The patches can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=afs-sticky-2
Thanks,
David
David Howells (2):
vfs: Allow filesystems with foreign owner IDs to override UID checks
afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir
Documentation/filesystems/vfs.rst | 23 ++++-
fs/afs/dir.c | 2 +
fs/afs/file.c | 2 +
fs/afs/internal.h | 3 +
fs/afs/security.c | 46 +++++++++
fs/attr.c | 58 ++++++-----
fs/coredump.c | 3 +-
fs/inode.c | 11 +-
fs/internal.h | 1 +
fs/locks.c | 7 +-
fs/namei.c | 161 ++++++++++++++++++++++++------
fs/remap_range.c | 20 ++--
include/linux/fs.h | 6 +-
13 files changed, 270 insertions(+), 73 deletions(-)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS
@ 2025-09-03 12:01 David Howells
0 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2025-09-03 12:01 UTC (permalink / raw)
To: Al Viro, Christian Brauner
Cc: David Howells, Marc Dionne, Jeffrey Altman, Steve French,
linux-afs, openafs-devel, linux-cifs, linux-nfs, linux-fsdevel,
linux-kernel
Hi Al, Christian,
Here's a pair of fixes that deal with some places the VFS mishandles
foreign user ID checks. By "foreign" I mean that the user IDs from the
filesystem do not belong in the same number space as the system's user IDs.
Network filesystems are prime examples of this, but it may also impact
things like USB drives or cdroms.
Take AFS as example: Whilst each file does have a numeric user ID, the file
may be accessed from a world-accessible public-facing server from some
other organisation with its own idea of what that user ID refers to. IDs
from AFS may also collide with the system's own set of IDs and may also be
unrepresentable as a 32-bit UID (in the case of AuriStor servers).
Further, kAFS uses a key containing an authentication token to specify the
subject doing an RPC operation to the server - and, as such, this needs to
be used instead of current_fsuid() in determining whether the current user
has ownership rights over a file.
Additionally, filesystems (CIFS being a notable example) may also have user
identifiers that aren't simple integers.
Now the problem in the VFS is that there are a number of places where it
assumes it can directly compare i_uid (possibly id-mapped) to either than
on another inode or a UID drawn from elsewhere (e.g. current_uid()) - but
this doesn't work right.
This causes the write-to-sticky check to work incorrectly for AFS (though
this is currently masked by a workaround in bash that is slated to be
removed) whereby open(O_CREAT) of such a file will fail when it shouldn't.
Two patches are provided:
(1) Add a pair of inode operations, one to compare the ownership of a pair
of inodes and the other to see if the current process has ownership
rights over an inode. Usage of this is then extended out into the
VFS, replacing comparisons between i_uid and i_uid and between i_uid
and current_fsuid(). The default, it the inode ops are unimplemented,
is to do those direct i_uid comparisons.
(2) Fixes the bash workaround issue with regard to AFS, overriding the
checks as to whether two inodes have the same owner and the check as
to whether the current user owns an inode to work within the AFS
model.
kAFS uses the result of a status-fetch with a suitable key to determine
file ownership (if the ADMINISTER bit is set) and just compares the 64-bit
owner IDs to determine if two inodes have the same ownership.
Note that chown may also need modifying in some way - but that can't
necessarily supply the information required (for instance, an AuriStor YFS ID
is 64 bits, but chown can only handle a 32-bit integer; CIFS might use a
GUID).
The patches can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=afs-sticky-2
Thanks,
David
David Howells (2):
vfs: Allow filesystems with foreign owner IDs to override UID checks
afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir
Documentation/filesystems/vfs.rst | 21 ++++
fs/afs/dir.c | 2 +
fs/afs/file.c | 2 +
fs/afs/internal.h | 3 +
fs/afs/security.c | 46 +++++++++
fs/attr.c | 58 ++++++-----
fs/coredump.c | 2 +-
fs/inode.c | 11 +-
fs/internal.h | 1 +
fs/locks.c | 7 +-
fs/namei.c | 161 ++++++++++++++++++++++++------
fs/remap_range.c | 20 ++--
include/linux/fs.h | 6 +-
13 files changed, 269 insertions(+), 71 deletions(-)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS
@ 2025-10-14 13:35 David Howells
2025-10-14 13:35 ` [PATCH 1/2] vfs: Allow filesystems with foreign owner IDs to override UID checks David Howells
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: David Howells @ 2025-10-14 13:35 UTC (permalink / raw)
To: Al Viro, Christian Brauner
Cc: David Howells, Marc Dionne, Jeffrey Altman, Steve French,
linux-afs, openafs-devel, linux-cifs, linux-nfs, linux-fsdevel,
linux-kernel
Hi Al, Christian,
Here's a pair of fixes that deal with some places the VFS mishandles
foreign user ID checks. By "foreign" I mean that the user IDs from the
filesystem do not belong in the same number space as the system's user IDs.
Network filesystems are prime examples of this, but it may also impact
things like USB drives or cdroms.
Take AFS as example: Whilst each file does have a numeric user ID, the file
may be accessed from a world-accessible public-facing server from some
other organisation with its own idea of what that user ID refers to. IDs
from AFS may also collide with the system's own set of IDs and may also be
unrepresentable as a 32-bit UID (in the case of AuriStor servers).
Further, kAFS uses a key containing an authentication token to specify the
subject doing an RPC operation to the server - and, as such, this needs to
be used instead of current_fsuid() in determining whether the current user
has ownership rights over a file.
Additionally, filesystems (CIFS being a notable example) may also have user
identifiers that aren't simple integers.
Now the problem in the VFS is that there are a number of places where it
assumes it can directly compare i_uid (possibly id-mapped) to either than
on another inode or a UID drawn from elsewhere (e.g. current_uid()) - but
this doesn't work right.
This causes the write-to-sticky check to work incorrectly for AFS (though
this is currently masked by a workaround in bash that is slated to be
removed) whereby open(O_CREAT) of such a file will fail when it shouldn't.
Two patches are provided:
(1) Add a pair of inode operations, one to compare the ownership of a pair
of inodes and the other to see if the current process has ownership
rights over an inode. Usage of this is then extended out into the
VFS, replacing comparisons between i_uid and i_uid and between i_uid
and current_fsuid(). The default, it the inode ops are unimplemented,
is to do those direct i_uid comparisons.
(2) Fixes the bash workaround issue with regard to AFS, overriding the
checks as to whether two inodes have the same owner and the check as
to whether the current user owns an inode to work within the AFS
model.
kAFS uses the result of a status-fetch with a suitable key to determine
file ownership (if the ADMINISTER bit is set) and just compares the 64-bit
owner IDs to determine if two inodes have the same ownership.
Note that chown may also need modifying in some way - but that can't
necessarily supply the information required (for instance, an AuriStor YFS ID
is 64 bits, but chown can only handle a 32-bit integer; CIFS might use a
GUID).
The patches can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=afs-sticky-2
Thanks,
David
David Howells (2):
vfs: Allow filesystems with foreign owner IDs to override UID checks
afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir
Documentation/filesystems/vfs.rst | 21 ++++
fs/afs/dir.c | 2 +
fs/afs/file.c | 2 +
fs/afs/internal.h | 3 +
fs/afs/security.c | 46 +++++++++
fs/attr.c | 58 ++++++-----
fs/coredump.c | 2 +-
fs/inode.c | 11 +-
fs/internal.h | 1 +
fs/locks.c | 7 +-
fs/namei.c | 161 ++++++++++++++++++++++++------
fs/remap_range.c | 20 ++--
include/linux/fs.h | 6 +-
13 files changed, 269 insertions(+), 71 deletions(-)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] vfs: Allow filesystems with foreign owner IDs to override UID checks
2025-10-14 13:35 [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS David Howells
@ 2025-10-14 13:35 ` David Howells
2025-10-21 12:38 ` Christian Brauner
2025-10-21 13:20 ` David Howells
2025-10-14 13:35 ` [PATCH 2/2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir David Howells
2025-10-14 18:27 ` [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS Steve French
2 siblings, 2 replies; 10+ messages in thread
From: David Howells @ 2025-10-14 13:35 UTC (permalink / raw)
To: Al Viro, Christian Brauner
Cc: David Howells, Marc Dionne, Jeffrey Altman, Steve French,
linux-afs, openafs-devel, linux-cifs, linux-nfs, linux-fsdevel,
linux-kernel, Etienne Champetier, Chet Ramey, Cheyenne Wills,
Christian Brauner, Mimi Zohar, linux-integrity
A number of ownership checks made by the VFS make a number of assumptions:
(1) that it is meaningful to compare inode->i_uid to a second ->i_uid or
to current_fsuid(),
(2) that current_fsuid() represents the subject of the action,
(3) that the number in ->i_uid belong to the system's ID space and
(4) that the IDs can be represented by 32-bit integers.
Network filesystems, however, may violate all four of these assumptions.
Indeed, a network filesystem may not even have an actual concept of a UNIX
integer UID (cifs without POSIX extensions, for example). Plug-in block
filesystems (e.g. USB drives) may also violate this assumption.
In particular, AFS implements its own ACL security model with its own
per-cell user ID space with 64-bit IDs for some server variants. The
subject is represented by a token in a key, not current_fsuid(). The AFS
user IDs and the system user IDs for a cell may be numerically equivalent,
but that's matter of administrative policy and should perhaps be noted in
the cell definition or by mount option. A subsequent patch will address
AFS.
To help fix this, three functions are defined to perform UID comparison
within the VFS:
(1) vfs_inode_is_owned_by_me(). This defaults to comparing i_uid to
current_fsuid(), with appropriate namespace mapping, assuming that the
fsuid identifies the subject of the action. The filesystem may
override it by implementing an inode op:
int (*is_owned_by_me)(struct mnt_idmap *idmap, struct inode *inode);
This should return 0 if owned, 1 if not or an error if there's some
sort of lookup failure. It may use a means of identifying the subject
of the action other than fsuid, for example by using an authentication
token stored in a key.
(2) vfs_inodes_have_same_owner(). This defaults to comparing the i_uids
of two different inodes with appropriate namespace mapping. The
filesystem may override it by implementing another inode op:
int (*have_same_owner)(struct mnt_idmap *idmap, struct inode *inode1,
struct inode *inode2);
Again, this should return 0 if matching, 1 if not or an error if
there's some sort of lookup failure.
(3) vfs_inode_and_dir_have_same_owner(). This is similar to (2), but
assumes that the second inode is the parent directory to the first and
takes a nameidata struct instead of a second inode pointer.
Fix a number of places within the VFS where such UID checks are made that
should be deferring interpretation to the filesystem.
(*) chown_ok()
(*) chgrp_ok()
Call vfs_inode_is_owned_by_me(). Possibly these need to defer all
their checks to the network filesystem as the interpretation of the
new UID/GID depends on the netfs too, but the ->setattr() method gets
a chance to deal with that.
(*) coredump_file()
Call vfs_is_owned_by_me() to check that the file created is owned by
the caller - but the check that's there might be sufficient.
(*) inode_owner_or_capable()
Call vfs_is_owned_by_me(). I'm not sure whether the namespace mapping
makes sense in such a case, but it probably could be used.
(*) vfs_setlease()
Call vfs_is_owned_by_me(). Actually, it should query if leasing is
permitted.
Also, setting locks could perhaps do with a permission call to the
filesystem driver as AFS, for example, has a lock permission bit in
the ACL, but since the AFS server checks that when the RPC call is
made, it's probably unnecessary.
(*) acl_permission_check()
(*) posix_acl_permission()
Unchanged. These functions are only used by generic_permission()
which is overridden if ->permission() is supplied, and when evaluating
a POSIX ACL, it should arguably be checking the UID anyway.
AFS, for example, implements its own ACLs and evaluates them in
->permission() and on the server.
(*) may_follow_link()
Call vfs_inode_and_dir_have_same_owner() and vfs_is_owned_by_me() on
the the link and its parent dir.
(*) may_create_in_sticky()
Call vfs_is_owned_by_me() and also vfs_inode_and_dir_have_same_owner()
both.
[?] Should this return ok immediately if the open call we're in
created the file being checked.
(*) __check_sticky()
Call vfs_is_owned_by_me() on both the dir and the inode, but for AFS
vfs_is_owned_by_me() on a directory doesn't work, so call
vfs_inodes_have_same_owner() instead to check the directory (as is
done in may_create_in_sticky()).
(*) may_dedupe_file()
Call vfs_is_owned_by_me().
(*) IMA policy ops.
Unchanged for now. I'm not sure what the best way to deal with this
is - if, indeed, it needs any changes.
Note that wrapping stuff up into vfs_inode_is_owned_by_me() isn't
necessarily the most efficient as it means we may end up doing the uid
idmapping an extra time - though this is only done in three places, all to
do with world-writable sticky dir checks.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Etienne Champetier <champetier.etienne@gmail.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Chet Ramey <chet.ramey@case.edu>
cc: Cheyenne Wills <cwills@sinenomine.net>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Christian Brauner <brauner@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Mimi Zohar <zohar@linux.ibm.com>
cc: linux-afs@lists.infradead.org
cc: openafs-devel@openafs.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-integrity@vger.kernel.org
Link: https://groups.google.com/g/gnu.bash.bug/c/6PPTfOgFdL4/m/2AQU-S1N76UJ
Link: https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=bash-5.3-rc1#n733
---
Documentation/filesystems/vfs.rst | 21 ++++
fs/attr.c | 58 ++++++-----
fs/coredump.c | 2 +-
fs/inode.c | 11 +-
fs/internal.h | 1 +
fs/locks.c | 7 +-
fs/namei.c | 161 ++++++++++++++++++++++++------
fs/remap_range.c | 20 ++--
include/linux/fs.h | 6 +-
9 files changed, 216 insertions(+), 71 deletions(-)
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 4f13b01e42eb..5acbad3be4fd 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -495,6 +495,9 @@ As of kernel 2.6.22, the following members are defined:
struct dentry *dentry, struct file_kattr *fa);
int (*fileattr_get)(struct dentry *dentry, struct file_kattr *fa);
struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
+ int (*is_owned_by_me)(struct mnt_idmap *idmap, struct inode *inode);
+ int (*have_same_owner)(struct mnt_idmap *idmap, struct inode *inode,
+ struct dentry *dentry);
};
Again, all methods are called without any locks being held, unless
@@ -679,6 +682,24 @@ otherwise noted.
filesystem must define this operation to use
simple_offset_dir_operations.
+``is_owned_by_me``
+ called to determine if the file can be considered to be 'owned' by
+ the owner of the process or if the process has a token that grants
+ it ownership privileges. If unset, the default is to compare i_uid
+ to current_fsuid() - but this may give incorrect results for some
+ network or plug-in block filesystems. For example, AFS determines
+ ownership entirely according to an obtained token and i_uid may not
+ even be from the same ID space as current_uid().
+
+``have_same_owner``
+ called to determine if an inode has the same owner as its immediate
+ parent on the path walked. If unset, the default is to simply
+ compare the i_uid of both. For example, AFS compares the owner IDs
+ of both - but these are a 64-bit values on some variants that might
+ not fit into a kuid_t and cifs has GUIDs that cannot be compared to
+ kuid_t.
+
+
The Address Space Object
========================
diff --git a/fs/attr.c b/fs/attr.c
index 795f231d00e8..096401a4815d 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -16,6 +16,7 @@
#include <linux/fcntl.h>
#include <linux/filelock.h>
#include <linux/security.h>
+#include "internal.h"
/**
* setattr_should_drop_sgid - determine whether the setgid bit needs to be
@@ -91,19 +92,21 @@ EXPORT_SYMBOL(setattr_should_drop_suidgid);
* permissions. On non-idmapped mounts or if permission checking is to be
* performed on the raw inode simply pass @nop_mnt_idmap.
*/
-static bool chown_ok(struct mnt_idmap *idmap,
- const struct inode *inode, vfsuid_t ia_vfsuid)
+static int chown_ok(struct mnt_idmap *idmap,
+ struct inode *inode, vfsuid_t ia_vfsuid)
{
vfsuid_t vfsuid = i_uid_into_vfsuid(idmap, inode);
- if (vfsuid_eq_kuid(vfsuid, current_fsuid()) &&
- vfsuid_eq(ia_vfsuid, vfsuid))
- return true;
+ int ret;
+
+ ret = vfs_inode_is_owned_by_me(idmap, inode);
+ if (ret <= 0)
+ return ret;
if (capable_wrt_inode_uidgid(idmap, inode, CAP_CHOWN))
- return true;
+ return 0;
if (!vfsuid_valid(vfsuid) &&
ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
- return true;
- return false;
+ return 0;
+ return -EPERM;
}
/**
@@ -118,23 +121,27 @@ static bool chown_ok(struct mnt_idmap *idmap,
* permissions. On non-idmapped mounts or if permission checking is to be
* performed on the raw inode simply pass @nop_mnt_idmap.
*/
-static bool chgrp_ok(struct mnt_idmap *idmap,
- const struct inode *inode, vfsgid_t ia_vfsgid)
+static int chgrp_ok(struct mnt_idmap *idmap,
+ struct inode *inode, vfsgid_t ia_vfsgid)
{
vfsgid_t vfsgid = i_gid_into_vfsgid(idmap, inode);
- vfsuid_t vfsuid = i_uid_into_vfsuid(idmap, inode);
- if (vfsuid_eq_kuid(vfsuid, current_fsuid())) {
+ int ret;
+
+ ret = vfs_inode_is_owned_by_me(idmap, inode);
+ if (ret < 0)
+ return ret;
+ if (ret == 0) {
if (vfsgid_eq(ia_vfsgid, vfsgid))
- return true;
+ return 0;
if (vfsgid_in_group_p(ia_vfsgid))
- return true;
+ return 0;
}
if (capable_wrt_inode_uidgid(idmap, inode, CAP_CHOWN))
- return true;
+ return 0;
if (!vfsgid_valid(vfsgid) &&
ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
- return true;
- return false;
+ return 0;
+ return -EPERM;
}
/**
@@ -163,6 +170,7 @@ int setattr_prepare(struct mnt_idmap *idmap, struct dentry *dentry,
{
struct inode *inode = d_inode(dentry);
unsigned int ia_valid = attr->ia_valid;
+ int ret;
/*
* First check size constraints. These can't be overriden using
@@ -179,14 +187,18 @@ int setattr_prepare(struct mnt_idmap *idmap, struct dentry *dentry,
goto kill_priv;
/* Make sure a caller can chown. */
- if ((ia_valid & ATTR_UID) &&
- !chown_ok(idmap, inode, attr->ia_vfsuid))
- return -EPERM;
+ if (ia_valid & ATTR_UID) {
+ ret = chown_ok(idmap, inode, attr->ia_vfsuid);
+ if (ret < 0)
+ return ret;
+ }
/* Make sure caller can chgrp. */
- if ((ia_valid & ATTR_GID) &&
- !chgrp_ok(idmap, inode, attr->ia_vfsgid))
- return -EPERM;
+ if (ia_valid & ATTR_GID) {
+ ret = chgrp_ok(idmap, inode, attr->ia_vfsgid);
+ if (ret < 0)
+ return ret;
+ }
/* Make sure a caller can chmod. */
if (ia_valid & ATTR_MODE) {
diff --git a/fs/coredump.c b/fs/coredump.c
index b5fc06a092a4..ac113e41d090 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -951,7 +951,7 @@ static bool coredump_file(struct core_name *cn, struct coredump_params *cprm,
* filesystem.
*/
idmap = file_mnt_idmap(file);
- if (!vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), current_fsuid())) {
+ if (vfs_inode_is_owned_by_me(idmap, inode) != 0) {
coredump_report_failure("Core dump to %s aborted: cannot preserve file owner", cn->corename);
return false;
}
diff --git a/fs/inode.c b/fs/inode.c
index ec9339024ac3..61e6b1d71e86 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2628,16 +2628,19 @@ EXPORT_SYMBOL(inode_init_owner);
* On non-idmapped mounts or if permission checking is to be performed on the
* raw inode simply pass @nop_mnt_idmap.
*/
-bool inode_owner_or_capable(struct mnt_idmap *idmap,
- const struct inode *inode)
+bool inode_owner_or_capable(struct mnt_idmap *idmap, struct inode *inode)
{
vfsuid_t vfsuid;
struct user_namespace *ns;
+ int ret;
- vfsuid = i_uid_into_vfsuid(idmap, inode);
- if (vfsuid_eq_kuid(vfsuid, current_fsuid()))
+ ret = vfs_inode_is_owned_by_me(idmap, inode);
+ if (ret == 0)
return true;
+ if (ret < 0)
+ return false;
+ vfsuid = i_uid_into_vfsuid(idmap, inode);
ns = current_user_ns();
if (vfsuid_has_mapping(ns, vfsuid) && ns_capable(ns, CAP_FOWNER))
return true;
diff --git a/fs/internal.h b/fs/internal.h
index 9b2b4d116880..29682c6edecd 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -52,6 +52,7 @@ extern int finish_clean_context(struct fs_context *fc);
/*
* namei.c
*/
+int vfs_inode_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode);
extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
struct path *path, const struct path *root);
int do_rmdir(int dfd, struct filename *name);
diff --git a/fs/locks.c b/fs/locks.c
index 04a3f0e20724..b710bf0733b0 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -68,6 +68,7 @@
#include <trace/events/filelock.h>
#include <linux/uaccess.h>
+#include "internal.h"
static struct file_lock *file_lock(struct file_lock_core *flc)
{
@@ -2013,10 +2014,12 @@ int
vfs_setlease(struct file *filp, int arg, struct file_lease **lease, void **priv)
{
struct inode *inode = file_inode(filp);
- vfsuid_t vfsuid = i_uid_into_vfsuid(file_mnt_idmap(filp), inode);
int error;
- if ((!vfsuid_eq_kuid(vfsuid, current_fsuid())) && !capable(CAP_LEASE))
+ error = vfs_inode_is_owned_by_me(file_mnt_idmap(filp), inode);
+ if (error < 0)
+ return error;
+ if (error != 0 && !capable(CAP_LEASE))
return -EACCES;
if (!S_ISREG(inode->i_mode))
return -EINVAL;
diff --git a/fs/namei.c b/fs/namei.c
index 7377020a2cba..7dbcb5d50339 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -53,8 +53,8 @@
* The new code replaces the old recursive symlink resolution with
* an iterative one (in case of non-nested symlink chains). It does
* this with calls to <fs>_follow_link().
- * As a side effect, dir_namei(), _namei() and follow_link() are now
- * replaced with a single function lookup_dentry() that can handle all
+ * As a side effect, dir_namei(), _namei() and follow_link() are now
+ * replaced with a single function lookup_dentry() that can handle all
* the special cases of the former code.
*
* With the new dcache, the pathname is stored at each inode, at least as
@@ -1149,6 +1149,72 @@ fs_initcall(init_fs_namei_sysctls);
#endif /* CONFIG_SYSCTL */
+/*
+ * Determine if an inode is owned by the process (allowing for fsuid override),
+ * returning 0 if so, 1 if not and a negative error code if there was a problem
+ * making the determination.
+ */
+int vfs_inode_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode)
+{
+ if (unlikely(inode->i_op->is_owned_by_me))
+ return inode->i_op->is_owned_by_me(idmap, inode);
+ if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), current_fsuid()))
+ return 0;
+ return 1; /* Not same. */
+}
+
+/*
+ * Determine if an inode has the same owner as its parent, returning 0 if so, 1
+ * if not and a negative error code if there was a problem making the
+ * determination.
+ */
+static int vfs_inode_and_dir_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
+ const struct nameidata *nd)
+{
+ if (unlikely(inode->i_op->have_same_owner)) {
+ struct dentry *parent;
+ struct inode *dir;
+ int ret;
+
+ if (inode != nd->inode) {
+ dir = nd->inode;
+ ret = inode->i_op->have_same_owner(idmap, inode, dir);
+ } else if (nd->flags & LOOKUP_RCU) {
+ parent = READ_ONCE(nd->path.dentry);
+ dir = READ_ONCE(parent->d_inode);
+ if (!dir)
+ return -ECHILD;
+ ret = inode->i_op->have_same_owner(idmap, inode, dir);
+ } else {
+ parent = dget_parent(nd->path.dentry);
+ dir = parent->d_inode;
+ ret = inode->i_op->have_same_owner(idmap, inode, dir);
+ dput(parent);
+ }
+ return ret;
+ }
+
+ if (vfsuid_valid(nd->dir_vfsuid) &&
+ vfsuid_eq(i_uid_into_vfsuid(idmap, inode), nd->dir_vfsuid))
+ return 0;
+ return 1; /* Not same. */
+}
+
+/*
+ * Determine if two inodes have the same owner, returning 0 if so, 1 if not and
+ * a negative error code if there was a problem making the determination.
+ */
+static int vfs_inodes_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
+ struct inode *dir)
+{
+ if (unlikely(inode->i_op->have_same_owner))
+ return inode->i_op->have_same_owner(idmap, inode, dir);
+ if (vfsuid_eq(i_uid_into_vfsuid(idmap, inode),
+ i_uid_into_vfsuid(idmap, dir)))
+ return 0;
+ return 1; /* Not same. */
+}
+
/**
* may_follow_link - Check symlink following for unsafe situations
* @nd: nameidata pathwalk data
@@ -1165,27 +1231,28 @@ fs_initcall(init_fs_namei_sysctls);
*
* Returns 0 if following the symlink is allowed, -ve on error.
*/
-static inline int may_follow_link(struct nameidata *nd, const struct inode *inode)
+static inline int may_follow_link(struct nameidata *nd, struct inode *inode)
{
struct mnt_idmap *idmap;
- vfsuid_t vfsuid;
+ int ret;
if (!sysctl_protected_symlinks)
return 0;
- idmap = mnt_idmap(nd->path.mnt);
- vfsuid = i_uid_into_vfsuid(idmap, inode);
- /* Allowed if owner and follower match. */
- if (vfsuid_eq_kuid(vfsuid, current_fsuid()))
- return 0;
-
/* Allowed if parent directory not sticky and world-writable. */
if ((nd->dir_mode & (S_ISVTX|S_IWOTH)) != (S_ISVTX|S_IWOTH))
return 0;
+ idmap = mnt_idmap(nd->path.mnt);
+ /* Allowed if owner and follower match. */
+ ret = vfs_inode_is_owned_by_me(idmap, inode);
+ if (ret <= 0)
+ return ret;
+
/* Allowed if parent directory and link owner match. */
- if (vfsuid_valid(nd->dir_vfsuid) && vfsuid_eq(nd->dir_vfsuid, vfsuid))
- return 0;
+ ret = vfs_inode_and_dir_have_same_owner(idmap, inode, nd);
+ if (ret <= 0)
+ return ret;
if (nd->flags & LOOKUP_RCU)
return -ECHILD;
@@ -1283,12 +1350,12 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link)
* @inode: the inode of the file to open
*
* Block an O_CREAT open of a FIFO (or a regular file) when:
- * - sysctl_protected_fifos (or sysctl_protected_regular) is enabled
- * - the file already exists
- * - we are in a sticky directory
- * - we don't own the file
+ * - sysctl_protected_fifos (or sysctl_protected_regular) is enabled,
+ * - the file already exists,
+ * - we are in a sticky directory,
+ * - the directory is world writable,
+ * - we don't own the file and
* - the owner of the directory doesn't own the file
- * - the directory is world writable
* If the sysctl_protected_fifos (or sysctl_protected_regular) is set to 2
* the directory doesn't have to be world writable: being group writable will
* be enough.
@@ -1299,13 +1366,45 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link)
* On non-idmapped mounts or if permission checking is to be performed on the
* raw inode simply pass @nop_mnt_idmap.
*
+ * For a filesystem (e.g. a network filesystem) that has a separate ID space
+ * and has foreign IDs (maybe even non-integer IDs), i_uid cannot be compared
+ * to current_fsuid() and may not be directly comparable to another i_uid.
+ * Instead, the filesystem is asked to perform the comparisons. With network
+ * filesystems, there also exists the possibility of doing anonymous
+ * operations and having anonymously-owned objects.
+ *
+ * We have the following scenarios:
+ *
+ * USER DIR FILE FILE ALLOWED
+ * OWNER OWNER STATE
+ * ======= ======= ======= ======= =======
+ * A A - New Yes
+ * A A A Exists Yes
+ * A A C Exists No
+ * A B - New Yes
+ * A B A Exists Yes, FO==U
+ * A B B Exists Yes, FO==DO
+ * A B C Exists No
+ * A anon[1] - New Yes
+ * A anon[1] A Exists Yes
+ * A anon[1] C Exists No
+ * anon A - New Yes
+ * anon A A Exists Yes, FO==DO
+ * anon anon[1] - New Yes
+ * anon anon[1] - Exists No
+ * anon A A Exists Yes, FO==DO
+ * anon A C Exists No
+ * anon A anon Exists No
+ *
+ * [1] Can anonymously-owned dirs be sticky?
+ *
* Returns 0 if the open is allowed, -ve on error.
*/
static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd,
- struct inode *const inode)
+ struct inode *inode)
{
umode_t dir_mode = nd->dir_mode;
- vfsuid_t dir_vfsuid = nd->dir_vfsuid, i_vfsuid;
+ int ret;
if (likely(!(dir_mode & S_ISVTX)))
return 0;
@@ -1316,13 +1415,13 @@ static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd,
if (S_ISFIFO(inode->i_mode) && !sysctl_protected_fifos)
return 0;
- i_vfsuid = i_uid_into_vfsuid(idmap, inode);
-
- if (vfsuid_eq(i_vfsuid, dir_vfsuid))
- return 0;
+ ret = vfs_inode_and_dir_have_same_owner(idmap, inode, nd);
+ if (ret <= 0)
+ return ret;
- if (vfsuid_eq_kuid(i_vfsuid, current_fsuid()))
- return 0;
+ ret = vfs_inode_is_owned_by_me(idmap, inode);
+ if (ret <= 0)
+ return ret;
if (likely(dir_mode & 0002)) {
audit_log_path_denied(AUDIT_ANOM_CREAT, "sticky_create");
@@ -3222,12 +3321,14 @@ EXPORT_SYMBOL(user_path_at);
int __check_sticky(struct mnt_idmap *idmap, struct inode *dir,
struct inode *inode)
{
- kuid_t fsuid = current_fsuid();
+ int ret;
- if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), fsuid))
- return 0;
- if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, dir), fsuid))
- return 0;
+ ret = vfs_inode_is_owned_by_me(idmap, inode);
+ if (ret <= 0)
+ return ret;
+ ret = vfs_inodes_have_same_owner(idmap, inode, dir);
+ if (ret <= 0)
+ return ret;
return !capable_wrt_inode_uidgid(idmap, inode, CAP_FOWNER);
}
EXPORT_SYMBOL(__check_sticky);
diff --git a/fs/remap_range.c b/fs/remap_range.c
index 26afbbbfb10c..9eee93c27001 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -413,20 +413,22 @@ loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
EXPORT_SYMBOL(vfs_clone_file_range);
/* Check whether we are allowed to dedupe the destination file */
-static bool may_dedupe_file(struct file *file)
+static int may_dedupe_file(struct file *file)
{
struct mnt_idmap *idmap = file_mnt_idmap(file);
struct inode *inode = file_inode(file);
+ int ret;
if (capable(CAP_SYS_ADMIN))
- return true;
+ return 0;
if (file->f_mode & FMODE_WRITE)
- return true;
- if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), current_fsuid()))
- return true;
+ return 0;
+ ret = vfs_inode_is_owned_by_me(idmap, inode);
+ if (ret <= 0)
+ return ret;
if (!inode_permission(idmap, inode, MAY_WRITE))
- return true;
- return false;
+ return 0;
+ return -EPERM;
}
loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
@@ -459,8 +461,8 @@ loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
if (ret)
return ret;
- ret = -EPERM;
- if (!may_dedupe_file(dst_file))
+ ret = may_dedupe_file(dst_file);
+ if (ret < 0)
goto out_drop_write;
ret = -EXDEV;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c895146c1444..f59a7456852f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2104,8 +2104,7 @@ static inline bool sb_start_intwrite_trylock(struct super_block *sb)
return __sb_start_write_trylock(sb, SB_FREEZE_FS);
}
-bool inode_owner_or_capable(struct mnt_idmap *idmap,
- const struct inode *inode);
+bool inode_owner_or_capable(struct mnt_idmap *idmap, struct inode *inode);
/*
* VFS helper functions..
@@ -2376,6 +2375,9 @@ struct inode_operations {
struct dentry *dentry, struct file_kattr *fa);
int (*fileattr_get)(struct dentry *dentry, struct file_kattr *fa);
struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
+ int (*is_owned_by_me)(struct mnt_idmap *idmap, struct inode *inode);
+ int (*have_same_owner)(struct mnt_idmap *idmap, struct inode *inode1,
+ struct inode *inode2);
} ____cacheline_aligned;
/* Did the driver provide valid mmap hook configuration? */
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir
2025-10-14 13:35 [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS David Howells
2025-10-14 13:35 ` [PATCH 1/2] vfs: Allow filesystems with foreign owner IDs to override UID checks David Howells
@ 2025-10-14 13:35 ` David Howells
2025-10-14 18:27 ` [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS Steve French
2 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2025-10-14 13:35 UTC (permalink / raw)
To: Al Viro, Christian Brauner
Cc: David Howells, Marc Dionne, Jeffrey Altman, Steve French,
linux-afs, openafs-devel, linux-cifs, linux-nfs, linux-fsdevel,
linux-kernel, Etienne Champetier, Chet Ramey, Cheyenne Wills,
Christian Brauner
Since version 1.11 (January 1992) Bash has a work around in redir_open()
that causes open(O_CREAT) of a file to be retried without O_CREAT if open()
fails with an EACCES error if bash was built with AFS workarounds
configured:
#if defined (AFS)
if ((fd < 0) && (errno == EACCES))
{
fd = open (filename, flags & ~O_CREAT, mode);
errno = EACCES; /* restore errno */
}
#endif /* AFS */
The ~O_CREAT fallback logic was introduced to workaround a bug[1] in the
IBM AFS 3.1 cache manager and server which can return EACCES in preference
to EEXIST if the requested file exists but the caller is neither granted
explicit PRSFS_READ permission nor is the file owner and is granted
PRSFS_INSERT permission on the directory. IBM AFS 3.2 altered the cache
manager permission checks but failed to correct the permission checks in
the AFS server. As of this writing, all IBM AFS derived servers continue
to return EACCES in preference to EEXIST when these conditions are met.
Bug reports have been filed with all implementations.
As an unintended side effect, the Bash fallback logic also undermines the
Linux kernel protections against O_CREAT opening FIFOs and regular files
not owned by the user in world writeable sticky directories - unless the
owner is the same as that of the directory - as was added in commit
30aba6656f61e ("namei: allow restricted O_CREAT of FIFOs and regular
files").
As a result the Bash fallback logic masks an incompatibility between the
ownership checks performed by may_create_in_sticky() and network
filesystems such as AFS where the uid namespace is disjoint from the uid
namespace of the local system.
However, the bash work around is going to be removed[2].
Fix this in the kernel by using a preceding patch that allows the user ID
comparisons to be overridden by:
(1) Implement the ->is_owned_by_me() inode op for kafs to determine if the
caller owns the file by checking to see if the server indicated the
ADMINISTER bit was set in the access rights returned by the
FS.FetchStatus and suchlike instead of checking the i_uid to
current_fsuid().
Unfortunately, this check doesn't work for directories, but none of
the ops should require that.
Note that anonymous accesses to AFS will never see the ADMINISTER bit
being set and so will not be perceived as owning an anonymously-owned
file.
(2) Implement the ->have_same_owner() inode op, for kafs to compare the
AFS owner IDs retrieved by FS.FetchStatus (which are 64-bit integers
with AuriStor's YFS server and, as such, won't fit in a kuid_t).
Note that whilst an anonymously-owned file will match an
anonymously-owned parent directory, an anonymously-owned directory
cannot have the sticky bit set.
This can be tested by creating a sticky directory (the user must have a
token to do this) and creating a file in it. Then strace bash doing "echo
foo >>file" and look at whether bash does a single, successful O_CREAT open
on the file or whether that one fails and then bash does one without
O_CREAT that succeeds.
Fixes: 30aba6656f61 ("namei: allow restricted O_CREAT of FIFOs and regular files")
Reported-by: Etienne Champetier <champetier.etienne@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Chet Ramey <chet.ramey@case.edu>
cc: Cheyenne Wills <cwills@sinenomine.net>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Christian Brauner <brauner@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: linux-afs@lists.infradead.org
cc: openafs-devel@openafs.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://groups.google.com/g/gnu.bash.bug/c/6PPTfOgFdL4/m/2AQU-S1N76UJ [1]
Link: https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=bash-5.3-rc1#n733 [2]
---
fs/afs/dir.c | 2 ++
fs/afs/file.c | 2 ++
fs/afs/internal.h | 3 +++
fs/afs/security.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 53 insertions(+)
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 89d36e3e5c79..2c0d1d5d4703 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -65,6 +65,8 @@ const struct inode_operations afs_dir_inode_operations = {
.permission = afs_permission,
.getattr = afs_getattr,
.setattr = afs_setattr,
+ .is_owned_by_me = afs_is_owned_by_me,
+ .have_same_owner = afs_have_same_owner,
};
const struct address_space_operations afs_dir_aops = {
diff --git a/fs/afs/file.c b/fs/afs/file.c
index f66a92294284..399a40c45d0a 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -47,6 +47,8 @@ const struct inode_operations afs_file_inode_operations = {
.getattr = afs_getattr,
.setattr = afs_setattr,
.permission = afs_permission,
+ .is_owned_by_me = afs_is_owned_by_me,
+ .have_same_owner = afs_have_same_owner,
};
const struct address_space_operations afs_file_aops = {
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index a45ae5c2ef8a..b9210637e8d7 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1517,6 +1517,9 @@ extern struct key *afs_request_key(struct afs_cell *);
extern struct key *afs_request_key_rcu(struct afs_cell *);
extern int afs_check_permit(struct afs_vnode *, struct key *, afs_access_t *);
extern int afs_permission(struct mnt_idmap *, struct inode *, int);
+int afs_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode);
+int afs_have_same_owner(struct mnt_idmap *idmap, struct inode *inode1,
+ struct inode *inode2);
extern void __exit afs_clean_up_permit_cache(void);
/*
diff --git a/fs/afs/security.c b/fs/afs/security.c
index 6a7744c9e2a2..19b11c7cb1ff 100644
--- a/fs/afs/security.c
+++ b/fs/afs/security.c
@@ -477,6 +477,52 @@ int afs_permission(struct mnt_idmap *idmap, struct inode *inode,
return ret;
}
+/*
+ * Determine if an inode is owned by 'me' - whatever that means for the
+ * filesystem. In the case of AFS, this means that the file is owned by the
+ * AFS user represented by the Rx Security Class token held in a key. Returns
+ * 0 if owned by me, 1 if not; can also return an error.
+ */
+int afs_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode)
+{
+ struct afs_vnode *vnode = AFS_FS_I(inode);
+ afs_access_t access;
+ struct key *key;
+ int ret;
+
+ if (S_ISDIR(inode->i_mode))
+ return 1; /* The ADMIN right check doesn't work for directories. */
+
+ key = afs_request_key(vnode->volume->cell);
+ if (IS_ERR(key))
+ return PTR_ERR(key);
+
+ /* Get the access rights for the key on this file. */
+ ret = afs_check_permit(vnode, key, &access);
+ if (ret < 0)
+ goto error;
+
+ /* We get the ADMINISTER bit if we own the file. */
+ ret = (access & AFS_ACE_ADMINISTER) ? 0 : 1;
+error:
+ key_put(key);
+ return ret;
+}
+
+/*
+ * Determine if a file has the same owner as its parent - whatever that means
+ * for the filesystem. In the case of AFS, this means comparing their AFS
+ * UIDs. Returns 0 if same, 1 if not same; can also return an error.
+ */
+int afs_have_same_owner(struct mnt_idmap *idmap, struct inode *inode1,
+ struct inode *inode2)
+{
+ const struct afs_vnode *vnode1 = AFS_FS_I(inode1);
+ const struct afs_vnode *vnode2 = AFS_FS_I(inode2);
+
+ return vnode1->status.owner != vnode2->status.owner;
+}
+
void __exit afs_clean_up_permit_cache(void)
{
int i;
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS
2025-10-14 13:35 [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS David Howells
2025-10-14 13:35 ` [PATCH 1/2] vfs: Allow filesystems with foreign owner IDs to override UID checks David Howells
2025-10-14 13:35 ` [PATCH 2/2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir David Howells
@ 2025-10-14 18:27 ` Steve French
2 siblings, 0 replies; 10+ messages in thread
From: Steve French @ 2025-10-14 18:27 UTC (permalink / raw)
To: David Howells
Cc: Al Viro, Christian Brauner, Marc Dionne, Jeffrey Altman,
Steve French, linux-afs, openafs-devel, linux-cifs, linux-nfs,
linux-fsdevel, linux-kernel
> Additionally, filesystems (CIFS being a notable example) may also have user
identifiers that aren't simple integers
Yes - this is a bigger problem for cifs.ko (although presumably NFS
since they use user@domain for NFSv4.1 and 4.2, instead of small 32
bit uids, could have some overlapping issues as well).
In the protocols, users are represented (e.g. in ACLs and in ownership
info returned by the server) as globally unique "SIDs" which are much
larger than UIDs and so can be safely mapped across large numbers of
systems.
On Tue, Oct 14, 2025 at 8:36 AM David Howells <dhowells@redhat.com> wrote:
>
> Hi Al, Christian,
>
> Here's a pair of fixes that deal with some places the VFS mishandles
> foreign user ID checks. By "foreign" I mean that the user IDs from the
> filesystem do not belong in the same number space as the system's user IDs.
> Network filesystems are prime examples of this, but it may also impact
> things like USB drives or cdroms.
>
> Take AFS as example: Whilst each file does have a numeric user ID, the file
> may be accessed from a world-accessible public-facing server from some
> other organisation with its own idea of what that user ID refers to. IDs
> from AFS may also collide with the system's own set of IDs and may also be
> unrepresentable as a 32-bit UID (in the case of AuriStor servers).
>
> Further, kAFS uses a key containing an authentication token to specify the
> subject doing an RPC operation to the server - and, as such, this needs to
> be used instead of current_fsuid() in determining whether the current user
> has ownership rights over a file.
>
> Additionally, filesystems (CIFS being a notable example) may also have user
> identifiers that aren't simple integers.
>
> Now the problem in the VFS is that there are a number of places where it
> assumes it can directly compare i_uid (possibly id-mapped) to either than
> on another inode or a UID drawn from elsewhere (e.g. current_uid()) - but
> this doesn't work right.
>
> This causes the write-to-sticky check to work incorrectly for AFS (though
> this is currently masked by a workaround in bash that is slated to be
> removed) whereby open(O_CREAT) of such a file will fail when it shouldn't.
>
> Two patches are provided:
>
> (1) Add a pair of inode operations, one to compare the ownership of a pair
> of inodes and the other to see if the current process has ownership
> rights over an inode. Usage of this is then extended out into the
> VFS, replacing comparisons between i_uid and i_uid and between i_uid
> and current_fsuid(). The default, it the inode ops are unimplemented,
> is to do those direct i_uid comparisons.
>
> (2) Fixes the bash workaround issue with regard to AFS, overriding the
> checks as to whether two inodes have the same owner and the check as
> to whether the current user owns an inode to work within the AFS
> model.
>
> kAFS uses the result of a status-fetch with a suitable key to determine
> file ownership (if the ADMINISTER bit is set) and just compares the 64-bit
> owner IDs to determine if two inodes have the same ownership.
>
> Note that chown may also need modifying in some way - but that can't
> necessarily supply the information required (for instance, an AuriStor YFS ID
> is 64 bits, but chown can only handle a 32-bit integer; CIFS might use a
> GUID).
>
> The patches can be found here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=afs-sticky-2
>
> Thanks,
> David
>
> David Howells (2):
> vfs: Allow filesystems with foreign owner IDs to override UID checks
> afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir
>
> Documentation/filesystems/vfs.rst | 21 ++++
> fs/afs/dir.c | 2 +
> fs/afs/file.c | 2 +
> fs/afs/internal.h | 3 +
> fs/afs/security.c | 46 +++++++++
> fs/attr.c | 58 ++++++-----
> fs/coredump.c | 2 +-
> fs/inode.c | 11 +-
> fs/internal.h | 1 +
> fs/locks.c | 7 +-
> fs/namei.c | 161 ++++++++++++++++++++++++------
> fs/remap_range.c | 20 ++--
> include/linux/fs.h | 6 +-
> 13 files changed, 269 insertions(+), 71 deletions(-)
>
>
--
Thanks,
Steve
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] vfs: Allow filesystems with foreign owner IDs to override UID checks
2025-10-14 13:35 ` [PATCH 1/2] vfs: Allow filesystems with foreign owner IDs to override UID checks David Howells
@ 2025-10-21 12:38 ` Christian Brauner
2025-10-21 13:20 ` David Howells
1 sibling, 0 replies; 10+ messages in thread
From: Christian Brauner @ 2025-10-21 12:38 UTC (permalink / raw)
To: David Howells
Cc: Al Viro, Christian Brauner, Marc Dionne, Jeffrey Altman,
Steve French, linux-afs, openafs-devel, linux-cifs, linux-nfs,
linux-fsdevel, linux-kernel, Etienne Champetier, Chet Ramey,
Cheyenne Wills, Mimi Zohar, linux-integrity
On Tue, Oct 14, 2025 at 02:35:44PM +0100, David Howells wrote:
> A number of ownership checks made by the VFS make a number of assumptions:
>
> (1) that it is meaningful to compare inode->i_uid to a second ->i_uid or
> to current_fsuid(),
>
> (2) that current_fsuid() represents the subject of the action,
>
> (3) that the number in ->i_uid belong to the system's ID space and
>
> (4) that the IDs can be represented by 32-bit integers.
>
> Network filesystems, however, may violate all four of these assumptions.
> Indeed, a network filesystem may not even have an actual concept of a UNIX
> integer UID (cifs without POSIX extensions, for example). Plug-in block
> filesystems (e.g. USB drives) may also violate this assumption.
>
> In particular, AFS implements its own ACL security model with its own
> per-cell user ID space with 64-bit IDs for some server variants. The
> subject is represented by a token in a key, not current_fsuid(). The AFS
> user IDs and the system user IDs for a cell may be numerically equivalent,
> but that's matter of administrative policy and should perhaps be noted in
> the cell definition or by mount option. A subsequent patch will address
> AFS.
>
> To help fix this, three functions are defined to perform UID comparison
> within the VFS:
>
> (1) vfs_inode_is_owned_by_me(). This defaults to comparing i_uid to
> current_fsuid(), with appropriate namespace mapping, assuming that the
> fsuid identifies the subject of the action. The filesystem may
> override it by implementing an inode op:
>
> int (*is_owned_by_me)(struct mnt_idmap *idmap, struct inode *inode);
>
> This should return 0 if owned, 1 if not or an error if there's some
> sort of lookup failure. It may use a means of identifying the subject
> of the action other than fsuid, for example by using an authentication
> token stored in a key.
>
> (2) vfs_inodes_have_same_owner(). This defaults to comparing the i_uids
> of two different inodes with appropriate namespace mapping. The
> filesystem may override it by implementing another inode op:
>
> int (*have_same_owner)(struct mnt_idmap *idmap, struct inode *inode1,
> struct inode *inode2);
>
> Again, this should return 0 if matching, 1 if not or an error if
> there's some sort of lookup failure.
>
> (3) vfs_inode_and_dir_have_same_owner(). This is similar to (2), but
> assumes that the second inode is the parent directory to the first and
> takes a nameidata struct instead of a second inode pointer.
>
> Fix a number of places within the VFS where such UID checks are made that
> should be deferring interpretation to the filesystem.
>
> (*) chown_ok()
> (*) chgrp_ok()
>
> Call vfs_inode_is_owned_by_me(). Possibly these need to defer all
> their checks to the network filesystem as the interpretation of the
> new UID/GID depends on the netfs too, but the ->setattr() method gets
> a chance to deal with that.
>
> (*) coredump_file()
>
> Call vfs_is_owned_by_me() to check that the file created is owned by
> the caller - but the check that's there might be sufficient.
>
> (*) inode_owner_or_capable()
>
> Call vfs_is_owned_by_me(). I'm not sure whether the namespace mapping
> makes sense in such a case, but it probably could be used.
>
> (*) vfs_setlease()
>
> Call vfs_is_owned_by_me(). Actually, it should query if leasing is
> permitted.
>
> Also, setting locks could perhaps do with a permission call to the
> filesystem driver as AFS, for example, has a lock permission bit in
> the ACL, but since the AFS server checks that when the RPC call is
> made, it's probably unnecessary.
>
> (*) acl_permission_check()
> (*) posix_acl_permission()
>
> Unchanged. These functions are only used by generic_permission()
> which is overridden if ->permission() is supplied, and when evaluating
> a POSIX ACL, it should arguably be checking the UID anyway.
>
> AFS, for example, implements its own ACLs and evaluates them in
> ->permission() and on the server.
>
> (*) may_follow_link()
>
> Call vfs_inode_and_dir_have_same_owner() and vfs_is_owned_by_me() on
> the the link and its parent dir.
>
> (*) may_create_in_sticky()
>
> Call vfs_is_owned_by_me() and also vfs_inode_and_dir_have_same_owner()
> both.
>
> [?] Should this return ok immediately if the open call we're in
> created the file being checked.
>
> (*) __check_sticky()
>
> Call vfs_is_owned_by_me() on both the dir and the inode, but for AFS
> vfs_is_owned_by_me() on a directory doesn't work, so call
> vfs_inodes_have_same_owner() instead to check the directory (as is
> done in may_create_in_sticky()).
>
> (*) may_dedupe_file()
>
> Call vfs_is_owned_by_me().
>
> (*) IMA policy ops.
>
> Unchanged for now. I'm not sure what the best way to deal with this
> is - if, indeed, it needs any changes.
>
> Note that wrapping stuff up into vfs_inode_is_owned_by_me() isn't
> necessarily the most efficient as it means we may end up doing the uid
> idmapping an extra time - though this is only done in three places, all to
> do with world-writable sticky dir checks.
>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Etienne Champetier <champetier.etienne@gmail.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: Jeffrey Altman <jaltman@auristor.com>
> cc: Chet Ramey <chet.ramey@case.edu>
> cc: Cheyenne Wills <cwills@sinenomine.net>
> cc: Alexander Viro <viro@zeniv.linux.org.uk>
> cc: Christian Brauner <brauner@kernel.org>
> cc: Steve French <sfrench@samba.org>
> cc: Mimi Zohar <zohar@linux.ibm.com>
> cc: linux-afs@lists.infradead.org
> cc: openafs-devel@openafs.org
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> cc: linux-integrity@vger.kernel.org
> Link: https://groups.google.com/g/gnu.bash.bug/c/6PPTfOgFdL4/m/2AQU-S1N76UJ
> Link: https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=bash-5.3-rc1#n733
> ---
> Documentation/filesystems/vfs.rst | 21 ++++
> fs/attr.c | 58 ++++++-----
> fs/coredump.c | 2 +-
> fs/inode.c | 11 +-
> fs/internal.h | 1 +
> fs/locks.c | 7 +-
> fs/namei.c | 161 ++++++++++++++++++++++++------
> fs/remap_range.c | 20 ++--
> include/linux/fs.h | 6 +-
> 9 files changed, 216 insertions(+), 71 deletions(-)
>
> diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
> index 4f13b01e42eb..5acbad3be4fd 100644
> --- a/Documentation/filesystems/vfs.rst
> +++ b/Documentation/filesystems/vfs.rst
> @@ -495,6 +495,9 @@ As of kernel 2.6.22, the following members are defined:
> struct dentry *dentry, struct file_kattr *fa);
> int (*fileattr_get)(struct dentry *dentry, struct file_kattr *fa);
> struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
> + int (*is_owned_by_me)(struct mnt_idmap *idmap, struct inode *inode);
> + int (*have_same_owner)(struct mnt_idmap *idmap, struct inode *inode,
> + struct dentry *dentry);
> };
>
> Again, all methods are called without any locks being held, unless
> @@ -679,6 +682,24 @@ otherwise noted.
> filesystem must define this operation to use
> simple_offset_dir_operations.
>
> +``is_owned_by_me``
> + called to determine if the file can be considered to be 'owned' by
> + the owner of the process or if the process has a token that grants
> + it ownership privileges. If unset, the default is to compare i_uid
> + to current_fsuid() - but this may give incorrect results for some
> + network or plug-in block filesystems. For example, AFS determines
> + ownership entirely according to an obtained token and i_uid may not
> + even be from the same ID space as current_uid().
> +
> +``have_same_owner``
> + called to determine if an inode has the same owner as its immediate
> + parent on the path walked. If unset, the default is to simply
> + compare the i_uid of both. For example, AFS compares the owner IDs
> + of both - but these are a 64-bit values on some variants that might
> + not fit into a kuid_t and cifs has GUIDs that cannot be compared to
> + kuid_t.
> +
> +
> The Address Space Object
> ========================
>
> diff --git a/fs/attr.c b/fs/attr.c
> index 795f231d00e8..096401a4815d 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -16,6 +16,7 @@
> #include <linux/fcntl.h>
> #include <linux/filelock.h>
> #include <linux/security.h>
> +#include "internal.h"
>
> /**
> * setattr_should_drop_sgid - determine whether the setgid bit needs to be
> @@ -91,19 +92,21 @@ EXPORT_SYMBOL(setattr_should_drop_suidgid);
> * permissions. On non-idmapped mounts or if permission checking is to be
> * performed on the raw inode simply pass @nop_mnt_idmap.
> */
> -static bool chown_ok(struct mnt_idmap *idmap,
> - const struct inode *inode, vfsuid_t ia_vfsuid)
> +static int chown_ok(struct mnt_idmap *idmap,
> + struct inode *inode, vfsuid_t ia_vfsuid)
> {
> vfsuid_t vfsuid = i_uid_into_vfsuid(idmap, inode);
> - if (vfsuid_eq_kuid(vfsuid, current_fsuid()) &&
> - vfsuid_eq(ia_vfsuid, vfsuid))
> - return true;
> + int ret;
> +
> + ret = vfs_inode_is_owned_by_me(idmap, inode);
> + if (ret <= 0)
> + return ret;
> if (capable_wrt_inode_uidgid(idmap, inode, CAP_CHOWN))
> - return true;
> + return 0;
> if (!vfsuid_valid(vfsuid) &&
> ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
> - return true;
> - return false;
> + return 0;
> + return -EPERM;
> }
>
> /**
> @@ -118,23 +121,27 @@ static bool chown_ok(struct mnt_idmap *idmap,
> * permissions. On non-idmapped mounts or if permission checking is to be
> * performed on the raw inode simply pass @nop_mnt_idmap.
> */
> -static bool chgrp_ok(struct mnt_idmap *idmap,
> - const struct inode *inode, vfsgid_t ia_vfsgid)
> +static int chgrp_ok(struct mnt_idmap *idmap,
> + struct inode *inode, vfsgid_t ia_vfsgid)
> {
> vfsgid_t vfsgid = i_gid_into_vfsgid(idmap, inode);
> - vfsuid_t vfsuid = i_uid_into_vfsuid(idmap, inode);
> - if (vfsuid_eq_kuid(vfsuid, current_fsuid())) {
> + int ret;
> +
> + ret = vfs_inode_is_owned_by_me(idmap, inode);
> + if (ret < 0)
> + return ret;
> + if (ret == 0) {
> if (vfsgid_eq(ia_vfsgid, vfsgid))
> - return true;
> + return 0;
> if (vfsgid_in_group_p(ia_vfsgid))
> - return true;
> + return 0;
> }
> if (capable_wrt_inode_uidgid(idmap, inode, CAP_CHOWN))
> - return true;
> + return 0;
> if (!vfsgid_valid(vfsgid) &&
> ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
> - return true;
> - return false;
> + return 0;
> + return -EPERM;
> }
>
> /**
> @@ -163,6 +170,7 @@ int setattr_prepare(struct mnt_idmap *idmap, struct dentry *dentry,
> {
> struct inode *inode = d_inode(dentry);
> unsigned int ia_valid = attr->ia_valid;
> + int ret;
>
> /*
> * First check size constraints. These can't be overriden using
> @@ -179,14 +187,18 @@ int setattr_prepare(struct mnt_idmap *idmap, struct dentry *dentry,
> goto kill_priv;
>
> /* Make sure a caller can chown. */
> - if ((ia_valid & ATTR_UID) &&
> - !chown_ok(idmap, inode, attr->ia_vfsuid))
> - return -EPERM;
> + if (ia_valid & ATTR_UID) {
> + ret = chown_ok(idmap, inode, attr->ia_vfsuid);
> + if (ret < 0)
> + return ret;
> + }
>
> /* Make sure caller can chgrp. */
> - if ((ia_valid & ATTR_GID) &&
> - !chgrp_ok(idmap, inode, attr->ia_vfsgid))
> - return -EPERM;
> + if (ia_valid & ATTR_GID) {
> + ret = chgrp_ok(idmap, inode, attr->ia_vfsgid);
> + if (ret < 0)
> + return ret;
> + }
>
> /* Make sure a caller can chmod. */
> if (ia_valid & ATTR_MODE) {
> diff --git a/fs/coredump.c b/fs/coredump.c
> index b5fc06a092a4..ac113e41d090 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -951,7 +951,7 @@ static bool coredump_file(struct core_name *cn, struct coredump_params *cprm,
> * filesystem.
> */
> idmap = file_mnt_idmap(file);
> - if (!vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), current_fsuid())) {
> + if (vfs_inode_is_owned_by_me(idmap, inode) != 0) {
> coredump_report_failure("Core dump to %s aborted: cannot preserve file owner", cn->corename);
> return false;
> }
> diff --git a/fs/inode.c b/fs/inode.c
> index ec9339024ac3..61e6b1d71e86 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2628,16 +2628,19 @@ EXPORT_SYMBOL(inode_init_owner);
> * On non-idmapped mounts or if permission checking is to be performed on the
> * raw inode simply pass @nop_mnt_idmap.
> */
> -bool inode_owner_or_capable(struct mnt_idmap *idmap,
> - const struct inode *inode)
> +bool inode_owner_or_capable(struct mnt_idmap *idmap, struct inode *inode)
> {
> vfsuid_t vfsuid;
> struct user_namespace *ns;
> + int ret;
>
> - vfsuid = i_uid_into_vfsuid(idmap, inode);
> - if (vfsuid_eq_kuid(vfsuid, current_fsuid()))
> + ret = vfs_inode_is_owned_by_me(idmap, inode);
> + if (ret == 0)
> return true;
> + if (ret < 0)
> + return false;
>
> + vfsuid = i_uid_into_vfsuid(idmap, inode);
> ns = current_user_ns();
> if (vfsuid_has_mapping(ns, vfsuid) && ns_capable(ns, CAP_FOWNER))
> return true;
> diff --git a/fs/internal.h b/fs/internal.h
> index 9b2b4d116880..29682c6edecd 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -52,6 +52,7 @@ extern int finish_clean_context(struct fs_context *fc);
> /*
> * namei.c
> */
> +int vfs_inode_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode);
> extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> struct path *path, const struct path *root);
> int do_rmdir(int dfd, struct filename *name);
> diff --git a/fs/locks.c b/fs/locks.c
> index 04a3f0e20724..b710bf0733b0 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -68,6 +68,7 @@
> #include <trace/events/filelock.h>
>
> #include <linux/uaccess.h>
> +#include "internal.h"
>
> static struct file_lock *file_lock(struct file_lock_core *flc)
> {
> @@ -2013,10 +2014,12 @@ int
> vfs_setlease(struct file *filp, int arg, struct file_lease **lease, void **priv)
> {
> struct inode *inode = file_inode(filp);
> - vfsuid_t vfsuid = i_uid_into_vfsuid(file_mnt_idmap(filp), inode);
> int error;
>
> - if ((!vfsuid_eq_kuid(vfsuid, current_fsuid())) && !capable(CAP_LEASE))
> + error = vfs_inode_is_owned_by_me(file_mnt_idmap(filp), inode);
> + if (error < 0)
> + return error;
> + if (error != 0 && !capable(CAP_LEASE))
> return -EACCES;
> if (!S_ISREG(inode->i_mode))
> return -EINVAL;
> diff --git a/fs/namei.c b/fs/namei.c
> index 7377020a2cba..7dbcb5d50339 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -53,8 +53,8 @@
> * The new code replaces the old recursive symlink resolution with
> * an iterative one (in case of non-nested symlink chains). It does
> * this with calls to <fs>_follow_link().
> - * As a side effect, dir_namei(), _namei() and follow_link() are now
> - * replaced with a single function lookup_dentry() that can handle all
> + * As a side effect, dir_namei(), _namei() and follow_link() are now
> + * replaced with a single function lookup_dentry() that can handle all
> * the special cases of the former code.
> *
> * With the new dcache, the pathname is stored at each inode, at least as
> @@ -1149,6 +1149,72 @@ fs_initcall(init_fs_namei_sysctls);
>
> #endif /* CONFIG_SYSCTL */
>
> +/*
> + * Determine if an inode is owned by the process (allowing for fsuid override),
> + * returning 0 if so, 1 if not and a negative error code if there was a problem
> + * making the determination.
> + */
> +int vfs_inode_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode)
> +{
> + if (unlikely(inode->i_op->is_owned_by_me))
> + return inode->i_op->is_owned_by_me(idmap, inode);
Similar to IOP_FASTPERM this should use a flag to avoid pointer derefs.
> + if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), current_fsuid()))
> + return 0;
> + return 1; /* Not same. */
> +}
> +
> +/*
> + * Determine if an inode has the same owner as its parent, returning 0 if so, 1
> + * if not and a negative error code if there was a problem making the
> + * determination.
> + */
> +static int vfs_inode_and_dir_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
> + const struct nameidata *nd)
> +{
> + if (unlikely(inode->i_op->have_same_owner)) {
Same, as above: similar to IOP_FASTPERM this should use a flag to avoid pointer derefs.
Really, we should very properly bias this towards the common case where
the filesystem will not have a custom ownership comparison callback at all.
> + struct dentry *parent;
> + struct inode *dir;
> + int ret;
> +
> + if (inode != nd->inode) {
> + dir = nd->inode;
> + ret = inode->i_op->have_same_owner(idmap, inode, dir);
> + } else if (nd->flags & LOOKUP_RCU) {
> + parent = READ_ONCE(nd->path.dentry);
> + dir = READ_ONCE(parent->d_inode);
> + if (!dir)
> + return -ECHILD;
> + ret = inode->i_op->have_same_owner(idmap, inode, dir);
> + } else {
> + parent = dget_parent(nd->path.dentry);
> + dir = parent->d_inode;
> + ret = inode->i_op->have_same_owner(idmap, inode, dir);
> + dput(parent);
> + }
> + return ret;
> + }
This about as ugly as it can get and costly...
> +
> + if (vfsuid_valid(nd->dir_vfsuid) &&
> + vfsuid_eq(i_uid_into_vfsuid(idmap, inode), nd->dir_vfsuid))
> + return 0;
> + return 1; /* Not same. */
Why is this returning 1? This will cause may_follow_link() to return 1
and that will bubble up as an error code? So shouldn't this return a
negative error code instead?
> +}
> +
> +/*
> + * Determine if two inodes have the same owner, returning 0 if so, 1 if not and
> + * a negative error code if there was a problem making the determination.
> + */
> +static int vfs_inodes_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
> + struct inode *dir)
> +{
> + if (unlikely(inode->i_op->have_same_owner))
> + return inode->i_op->have_same_owner(idmap, inode, dir);
> + if (vfsuid_eq(i_uid_into_vfsuid(idmap, inode),
> + i_uid_into_vfsuid(idmap, dir)))
> + return 0;
> + return 1; /* Not same. */
> +}
> +
> /**
> * may_follow_link - Check symlink following for unsafe situations
> * @nd: nameidata pathwalk data
> @@ -1165,27 +1231,28 @@ fs_initcall(init_fs_namei_sysctls);
> *
> * Returns 0 if following the symlink is allowed, -ve on error.
> */
> -static inline int may_follow_link(struct nameidata *nd, const struct inode *inode)
> +static inline int may_follow_link(struct nameidata *nd, struct inode *inode)
> {
> struct mnt_idmap *idmap;
> - vfsuid_t vfsuid;
> + int ret;
>
> if (!sysctl_protected_symlinks)
> return 0;
>
> - idmap = mnt_idmap(nd->path.mnt);
> - vfsuid = i_uid_into_vfsuid(idmap, inode);
> - /* Allowed if owner and follower match. */
> - if (vfsuid_eq_kuid(vfsuid, current_fsuid()))
> - return 0;
> -
> /* Allowed if parent directory not sticky and world-writable. */
> if ((nd->dir_mode & (S_ISVTX|S_IWOTH)) != (S_ISVTX|S_IWOTH))
> return 0;
>
> + idmap = mnt_idmap(nd->path.mnt);
> + /* Allowed if owner and follower match. */
> + ret = vfs_inode_is_owned_by_me(idmap, inode);
> + if (ret <= 0)
> + return ret;
> +
> /* Allowed if parent directory and link owner match. */
> - if (vfsuid_valid(nd->dir_vfsuid) && vfsuid_eq(nd->dir_vfsuid, vfsuid))
> - return 0;
> + ret = vfs_inode_and_dir_have_same_owner(idmap, inode, nd);
> + if (ret <= 0)
> + return ret;
Ok, so while that doesn't exactly surface the error it's still weird.
Please make that consistent. Either have those two new helper functions
return negative error codes and zero on success or have it be a proper
boolean instead so there's no possible confusion. This is just begging
for someone to do if (ret) return ret and bubble up that positive return
value.
>
> if (nd->flags & LOOKUP_RCU)
> return -ECHILD;
> @@ -1283,12 +1350,12 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link)
> * @inode: the inode of the file to open
> *
> * Block an O_CREAT open of a FIFO (or a regular file) when:
> - * - sysctl_protected_fifos (or sysctl_protected_regular) is enabled
> - * - the file already exists
> - * - we are in a sticky directory
> - * - we don't own the file
> + * - sysctl_protected_fifos (or sysctl_protected_regular) is enabled,
> + * - the file already exists,
> + * - we are in a sticky directory,
> + * - the directory is world writable,
> + * - we don't own the file and
> * - the owner of the directory doesn't own the file
> - * - the directory is world writable
> * If the sysctl_protected_fifos (or sysctl_protected_regular) is set to 2
> * the directory doesn't have to be world writable: being group writable will
> * be enough.
> @@ -1299,13 +1366,45 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link)
> * On non-idmapped mounts or if permission checking is to be performed on the
> * raw inode simply pass @nop_mnt_idmap.
> *
> + * For a filesystem (e.g. a network filesystem) that has a separate ID space
> + * and has foreign IDs (maybe even non-integer IDs), i_uid cannot be compared
> + * to current_fsuid() and may not be directly comparable to another i_uid.
> + * Instead, the filesystem is asked to perform the comparisons. With network
> + * filesystems, there also exists the possibility of doing anonymous
> + * operations and having anonymously-owned objects.
> + *
> + * We have the following scenarios:
> + *
> + * USER DIR FILE FILE ALLOWED
> + * OWNER OWNER STATE
> + * ======= ======= ======= ======= =======
> + * A A - New Yes
> + * A A A Exists Yes
> + * A A C Exists No
> + * A B - New Yes
> + * A B A Exists Yes, FO==U
> + * A B B Exists Yes, FO==DO
> + * A B C Exists No
> + * A anon[1] - New Yes
> + * A anon[1] A Exists Yes
> + * A anon[1] C Exists No
> + * anon A - New Yes
> + * anon A A Exists Yes, FO==DO
> + * anon anon[1] - New Yes
> + * anon anon[1] - Exists No
> + * anon A A Exists Yes, FO==DO
> + * anon A C Exists No
> + * anon A anon Exists No
> + *
> + * [1] Can anonymously-owned dirs be sticky?
> + *
> * Returns 0 if the open is allowed, -ve on error.
> */
> static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd,
> - struct inode *const inode)
> + struct inode *inode)
> {
> umode_t dir_mode = nd->dir_mode;
> - vfsuid_t dir_vfsuid = nd->dir_vfsuid, i_vfsuid;
> + int ret;
>
> if (likely(!(dir_mode & S_ISVTX)))
> return 0;
> @@ -1316,13 +1415,13 @@ static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd,
> if (S_ISFIFO(inode->i_mode) && !sysctl_protected_fifos)
> return 0;
>
> - i_vfsuid = i_uid_into_vfsuid(idmap, inode);
> -
> - if (vfsuid_eq(i_vfsuid, dir_vfsuid))
> - return 0;
> + ret = vfs_inode_and_dir_have_same_owner(idmap, inode, nd);
> + if (ret <= 0)
> + return ret;
>
> - if (vfsuid_eq_kuid(i_vfsuid, current_fsuid()))
> - return 0;
> + ret = vfs_inode_is_owned_by_me(idmap, inode);
> + if (ret <= 0)
> + return ret;
>
> if (likely(dir_mode & 0002)) {
> audit_log_path_denied(AUDIT_ANOM_CREAT, "sticky_create");
> @@ -3222,12 +3321,14 @@ EXPORT_SYMBOL(user_path_at);
> int __check_sticky(struct mnt_idmap *idmap, struct inode *dir,
> struct inode *inode)
> {
> - kuid_t fsuid = current_fsuid();
> + int ret;
>
> - if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), fsuid))
> - return 0;
> - if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, dir), fsuid))
> - return 0;
> + ret = vfs_inode_is_owned_by_me(idmap, inode);
> + if (ret <= 0)
> + return ret;
> + ret = vfs_inodes_have_same_owner(idmap, inode, dir);
> + if (ret <= 0)
> + return ret;
> return !capable_wrt_inode_uidgid(idmap, inode, CAP_FOWNER);
> }
> EXPORT_SYMBOL(__check_sticky);
> diff --git a/fs/remap_range.c b/fs/remap_range.c
> index 26afbbbfb10c..9eee93c27001 100644
> --- a/fs/remap_range.c
> +++ b/fs/remap_range.c
> @@ -413,20 +413,22 @@ loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> EXPORT_SYMBOL(vfs_clone_file_range);
>
> /* Check whether we are allowed to dedupe the destination file */
> -static bool may_dedupe_file(struct file *file)
> +static int may_dedupe_file(struct file *file)
> {
> struct mnt_idmap *idmap = file_mnt_idmap(file);
> struct inode *inode = file_inode(file);
> + int ret;
>
> if (capable(CAP_SYS_ADMIN))
> - return true;
> + return 0;
> if (file->f_mode & FMODE_WRITE)
> - return true;
> - if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), current_fsuid()))
> - return true;
> + return 0;
> + ret = vfs_inode_is_owned_by_me(idmap, inode);
> + if (ret <= 0)
> + return ret;
> if (!inode_permission(idmap, inode, MAY_WRITE))
> - return true;
> - return false;
> + return 0;
> + return -EPERM;
> }
>
> loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
> @@ -459,8 +461,8 @@ loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
> if (ret)
> return ret;
>
> - ret = -EPERM;
> - if (!may_dedupe_file(dst_file))
> + ret = may_dedupe_file(dst_file);
> + if (ret < 0)
> goto out_drop_write;
>
> ret = -EXDEV;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c895146c1444..f59a7456852f 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2104,8 +2104,7 @@ static inline bool sb_start_intwrite_trylock(struct super_block *sb)
> return __sb_start_write_trylock(sb, SB_FREEZE_FS);
> }
>
> -bool inode_owner_or_capable(struct mnt_idmap *idmap,
> - const struct inode *inode);
> +bool inode_owner_or_capable(struct mnt_idmap *idmap, struct inode *inode);
>
> /*
> * VFS helper functions..
> @@ -2376,6 +2375,9 @@ struct inode_operations {
> struct dentry *dentry, struct file_kattr *fa);
> int (*fileattr_get)(struct dentry *dentry, struct file_kattr *fa);
> struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
> + int (*is_owned_by_me)(struct mnt_idmap *idmap, struct inode *inode);
> + int (*have_same_owner)(struct mnt_idmap *idmap, struct inode *inode1,
> + struct inode *inode2);
> } ____cacheline_aligned;
>
> /* Did the driver provide valid mmap hook configuration? */
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] vfs: Allow filesystems with foreign owner IDs to override UID checks
2025-10-14 13:35 ` [PATCH 1/2] vfs: Allow filesystems with foreign owner IDs to override UID checks David Howells
2025-10-21 12:38 ` Christian Brauner
@ 2025-10-21 13:20 ` David Howells
2025-10-29 12:39 ` Christian Brauner
1 sibling, 1 reply; 10+ messages in thread
From: David Howells @ 2025-10-21 13:20 UTC (permalink / raw)
To: Christian Brauner
Cc: dhowells, Al Viro, Christian Brauner, Marc Dionne, Jeffrey Altman,
Steve French, linux-afs, openafs-devel, linux-cifs, linux-nfs,
linux-fsdevel, linux-kernel, Etienne Champetier, Chet Ramey,
Cheyenne Wills, Mimi Zohar, linux-integrity
Christian Brauner <brauner@kernel.org> wrote:
> > + if (unlikely(inode->i_op->have_same_owner)) {
>
> Same, as above: similar to IOP_FASTPERM this should use a flag to avoid pointer derefs.
Can we do these IOP_* flags better? Surely we can determine at the point the
inode has its ->i_op assigned that these things are provided? This optimises
the case where they don't exist at the expense of the case where they do (we
still have to check the pointer every time).
> > + if (unlikely(inode->i_op->have_same_owner)) {
>
> Same, as above: similar to IOP_FASTPERM this should use a flag to avoid pointer derefs.
>
> Really, we should very properly bias this towards the common case where
> the filesystem will not have a custom ownership comparison callback at all.
Hence the unlikely().
> > + struct dentry *parent;
> > + struct inode *dir;
> > + int ret;
> > +
> > + if (inode != nd->inode) {
> > + dir = nd->inode;
> > + ret = inode->i_op->have_same_owner(idmap, inode, dir);
> > + } else if (nd->flags & LOOKUP_RCU) {
> > + parent = READ_ONCE(nd->path.dentry);
> > + dir = READ_ONCE(parent->d_inode);
> > + if (!dir)
> > + return -ECHILD;
> > + ret = inode->i_op->have_same_owner(idmap, inode, dir);
> > + } else {
> > + parent = dget_parent(nd->path.dentry);
> > + dir = parent->d_inode;
> > + ret = inode->i_op->have_same_owner(idmap, inode, dir);
> > + dput(parent);
> > + }
> > + return ret;
> > + }
>
> This about as ugly as it can get and costly...
I can break this out into a helper, but it should make no difference to the
actual code generated.
> > + ret = vfs_inode_and_dir_have_same_owner(idmap, inode, nd);
> > + if (ret <= 0)
> > + return ret;
>
> Ok, so while that doesn't exactly surface the error it's still weird.
> Please make that consistent. Either have those two new helper functions
> return negative error codes and zero on success or have it be a proper
> boolean instead so there's no possible confusion. This is just begging
> for someone to do if (ret) return ret and bubble up that positive return
> value.
The problem is that you have three available returns: Yes they do, no they
don't and some arbitrary error was encountered. The first two are not error
cases, and potentially any error you pick to represent, say, "no" could also
be returned by the underlying filesystem.
David
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] vfs: Allow filesystems with foreign owner IDs to override UID checks
2025-10-21 13:20 ` David Howells
@ 2025-10-29 12:39 ` Christian Brauner
0 siblings, 0 replies; 10+ messages in thread
From: Christian Brauner @ 2025-10-29 12:39 UTC (permalink / raw)
To: David Howells
Cc: Al Viro, Christian Brauner, Marc Dionne, Jeffrey Altman,
Steve French, linux-afs, openafs-devel, linux-cifs, linux-nfs,
linux-fsdevel, linux-kernel, Etienne Champetier, Chet Ramey,
Cheyenne Wills, Mimi Zohar, linux-integrity
On Tue, Oct 21, 2025 at 02:20:21PM +0100, David Howells wrote:
> Christian Brauner <brauner@kernel.org> wrote:
>
> > > + if (unlikely(inode->i_op->have_same_owner)) {
> >
> > Same, as above: similar to IOP_FASTPERM this should use a flag to avoid pointer derefs.
>
> Can we do these IOP_* flags better? Surely we can determine at the point the
> inode has its ->i_op assigned that these things are provided? This optimises
> the case where they don't exist at the expense of the case where they do (we
> still have to check the pointer every time).
>
> > > + if (unlikely(inode->i_op->have_same_owner)) {
I think I mentioned this off-list. It looks like we can but I don't know
if there was any history behind not doing it that way. But please try.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-10-29 12:39 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-14 13:35 [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS David Howells
2025-10-14 13:35 ` [PATCH 1/2] vfs: Allow filesystems with foreign owner IDs to override UID checks David Howells
2025-10-21 12:38 ` Christian Brauner
2025-10-21 13:20 ` David Howells
2025-10-29 12:39 ` Christian Brauner
2025-10-14 13:35 ` [PATCH 2/2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir David Howells
2025-10-14 18:27 ` [PATCH 0/2] vfs, afs, bash: Fix miscomparison of foreign user IDs in the VFS Steve French
-- strict thread matches above, loose matches on Subject: below --
2025-09-03 12:01 David Howells
2025-07-23 15:26 David Howells
2025-05-19 16:11 David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).