* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize [not found] <20260624101436.362533-1-cem@kernel.org> @ 2026-06-24 13:40 ` Christoph Hellwig 2026-06-24 18:59 ` Carlos Maiolino 2026-06-25 10:43 ` Jan Kara 0 siblings, 2 replies; 8+ messages in thread From: Christoph Hellwig @ 2026-06-24 13:40 UTC (permalink / raw) To: cem Cc: linux-xfs, stable, Darrick J. Wong, Dave Chinner, Eric Sandeen, Christoph Hellwig, Dr. Thomas Orgis, Jan Kara, linux-fsdevel, Christian Brauner Adding Jan and Christian for quota and user_ns knowledge. On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote: > From: Carlos Maiolino <cem@kernel.org> > > An user reported a bug where he managed to evade group's quota > by changing a file's gid to a different group id the same user > belonged to, even though quotas were enforced on both gids and the > file's size was big enough to exceed the quota's hardlimit. > > Commit eba0549bc7d1 replaced a capable() call by a > has_capability_noaudit() to prevent unnecessary selinux audit messages. > Turns out that both calls have slightly different semantics even though > their documentation seems similar. Where in a nutshell: > > capable() - Tests the task's effective credentials > has_ns_capability_noaudit() - Tests the task's real credentials Eww.. > This most of the time has no practical difference but in some cases like > changing attrs (specifically group id in this case) through a NFS client > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively > bypassing quota accounting checks. Yeah, this does look wrong. Do the other conversion in the above commit have tthe same issue? > Using instead ns_capable_noaudit() should fix this issue and prevent > selinux audit messages. The generic quota code manages to do without either has_capability_noaudit or ns_capable_noaudit. I think this might be hidden behind inode_owner_or_capable calls. Any idea why we're different? > + bool force = ns_capable_noaudit(&init_user_ns, > + CAP_FOWNER); I'd do away with the local variable here. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize 2026-06-24 13:40 ` [PATCH] xfs: fix capabily check in xfs_setattr_nonsize Christoph Hellwig @ 2026-06-24 18:59 ` Carlos Maiolino 2026-06-25 10:43 ` Jan Kara 1 sibling, 0 replies; 8+ messages in thread From: Carlos Maiolino @ 2026-06-24 18:59 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-xfs, stable, Darrick J. Wong, Dave Chinner, Eric Sandeen, Dr. Thomas Orgis, Jan Kara, linux-fsdevel, Christian Brauner On Wed, Jun 24, 2026 at 03:40:39PM +0200, Christoph Hellwig wrote: > Adding Jan and Christian for quota and user_ns knowledge. > > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote: > > From: Carlos Maiolino <cem@kernel.org> > > > > An user reported a bug where he managed to evade group's quota > > by changing a file's gid to a different group id the same user > > belonged to, even though quotas were enforced on both gids and the > > file's size was big enough to exceed the quota's hardlimit. > > > > Commit eba0549bc7d1 replaced a capable() call by a > > has_capability_noaudit() to prevent unnecessary selinux audit messages. > > Turns out that both calls have slightly different semantics even though > > their documentation seems similar. Where in a nutshell: > > > > capable() - Tests the task's effective credentials > > has_ns_capability_noaudit() - Tests the task's real credentials > > Eww.. > > > This most of the time has no practical difference but in some cases like > > changing attrs (specifically group id in this case) through a NFS client > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively > > bypassing quota accounting checks. > > Yeah, this does look wrong. Do the other conversion in the above commit > have tthe same issue? I don't know yet, but I do hope to be able to convert everything back to a capable_noaudit() helper. I need some time to look at the other conversions, but giving the different credentials likely the are all victim of the same issue. > > > Using instead ns_capable_noaudit() should fix this issue and prevent > > selinux audit messages. > > The generic quota code manages to do without either has_capability_noaudit > or ns_capable_noaudit. I think this might be hidden behind > inode_owner_or_capable calls. Any idea why we're different? Yes, I looked into that to check the feasibility to use the generic code. This specific path goes through __dquot_transfer() which uses ignore_hardlimit() helper to decide between bypassing quota enforcement or not. The later though uses capable(CAP_SYS_RESOURCE) which would trigger selinux audit messages too at least for our case. This though made me wonder if we shouldn't be using CAP_SYS_RESOURCE in lieu of CAP_FOWNER in this path which seems more appropriate to bypass quota enforcement. Either way though, I think we should at least have an initial simple to backport patch to older releases. > > > + bool force = ns_capable_noaudit(&init_user_ns, > > + CAP_FOWNER); > > I'd do away with the local variable here. Sounds find to me. I wanted to avoid the function call with two arguments as an argument. But I honestly have no strong preference. Cheers! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize 2026-06-24 13:40 ` [PATCH] xfs: fix capabily check in xfs_setattr_nonsize Christoph Hellwig 2026-06-24 18:59 ` Carlos Maiolino @ 2026-06-25 10:43 ` Jan Kara 2026-06-25 11:01 ` Carlos Maiolino 2026-06-25 12:37 ` Christoph Hellwig 1 sibling, 2 replies; 8+ messages in thread From: Jan Kara @ 2026-06-25 10:43 UTC (permalink / raw) To: Christoph Hellwig Cc: cem, linux-xfs, stable, Darrick J. Wong, Dave Chinner, Eric Sandeen, Dr. Thomas Orgis, Jan Kara, linux-fsdevel, Christian Brauner On Wed 24-06-26 15:40:39, Christoph Hellwig wrote: > Adding Jan and Christian for quota and user_ns knowledge. > > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote: > > From: Carlos Maiolino <cem@kernel.org> > > > > An user reported a bug where he managed to evade group's quota > > by changing a file's gid to a different group id the same user > > belonged to, even though quotas were enforced on both gids and the > > file's size was big enough to exceed the quota's hardlimit. > > > > Commit eba0549bc7d1 replaced a capable() call by a > > has_capability_noaudit() to prevent unnecessary selinux audit messages. > > Turns out that both calls have slightly different semantics even though > > their documentation seems similar. Where in a nutshell: > > > > capable() - Tests the task's effective credentials > > has_ns_capability_noaudit() - Tests the task's real credentials > > Eww.. Yeah, that's a catch. > > This most of the time has no practical difference but in some cases like > > changing attrs (specifically group id in this case) through a NFS client > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively > > bypassing quota accounting checks. > > Yeah, this does look wrong. Do the other conversion in the above commit > have tthe same issue? > > > Using instead ns_capable_noaudit() should fix this issue and prevent > > selinux audit messages. > > The generic quota code manages to do without either has_capability_noaudit > or ns_capable_noaudit. I think this might be hidden behind > inode_owner_or_capable calls. Any idea why we're different? Actually no. Generic quota code has equivalent checks in ignore_hardlimit() function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why nobody complained about generic quota code is that we call ignore_hardlimit() only if we are above hardlimit whereas XFS calls this for every transaction... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize 2026-06-25 10:43 ` Jan Kara @ 2026-06-25 11:01 ` Carlos Maiolino 2026-06-25 12:37 ` Christoph Hellwig 1 sibling, 0 replies; 8+ messages in thread From: Carlos Maiolino @ 2026-06-25 11:01 UTC (permalink / raw) To: Jan Kara Cc: Christoph Hellwig, linux-xfs, stable, Darrick J. Wong, Dave Chinner, Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel, Christian Brauner On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote: > On Wed 24-06-26 15:40:39, Christoph Hellwig wrote: > > Adding Jan and Christian for quota and user_ns knowledge. > > > > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote: > > > From: Carlos Maiolino <cem@kernel.org> > > > > > > An user reported a bug where he managed to evade group's quota > > > by changing a file's gid to a different group id the same user > > > belonged to, even though quotas were enforced on both gids and the > > > file's size was big enough to exceed the quota's hardlimit. > > > > > > Commit eba0549bc7d1 replaced a capable() call by a > > > has_capability_noaudit() to prevent unnecessary selinux audit messages. > > > Turns out that both calls have slightly different semantics even though > > > their documentation seems similar. Where in a nutshell: > > > > > > capable() - Tests the task's effective credentials > > > has_ns_capability_noaudit() - Tests the task's real credentials > > > > Eww.. > > Yeah, that's a catch. > > > > This most of the time has no practical difference but in some cases like > > > changing attrs (specifically group id in this case) through a NFS client > > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively > > > bypassing quota accounting checks. > > > > Yeah, this does look wrong. Do the other conversion in the above commit > > have tthe same issue? > > > > > Using instead ns_capable_noaudit() should fix this issue and prevent > > > selinux audit messages. > > > > The generic quota code manages to do without either has_capability_noaudit > > or ns_capable_noaudit. I think this might be hidden behind > > inode_owner_or_capable calls. Any idea why we're different? > > Actually no. Generic quota code has equivalent checks in ignore_hardlimit() > function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why > nobody complained about generic quota code is that we call > ignore_hardlimit() only if we are above hardlimit whereas XFS calls this > for every transaction... But the capable() call works fine, it's the replacement by has_capability_noaudit() that enables quota bypass I *think* generic quota code is safe from this point. > > Honza > > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize 2026-06-25 10:43 ` Jan Kara 2026-06-25 11:01 ` Carlos Maiolino @ 2026-06-25 12:37 ` Christoph Hellwig 2026-06-25 15:00 ` Carlos Maiolino 1 sibling, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2026-06-25 12:37 UTC (permalink / raw) To: Jan Kara Cc: Christoph Hellwig, cem, linux-xfs, stable, Darrick J. Wong, Dave Chinner, Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel, Christian Brauner On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote: > On Wed 24-06-26 15:40:39, Christoph Hellwig wrote: > > Adding Jan and Christian for quota and user_ns knowledge. > > > > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote: > > > From: Carlos Maiolino <cem@kernel.org> > > > > > > An user reported a bug where he managed to evade group's quota > > > by changing a file's gid to a different group id the same user > > > belonged to, even though quotas were enforced on both gids and the > > > file's size was big enough to exceed the quota's hardlimit. > > > > > > Commit eba0549bc7d1 replaced a capable() call by a > > > has_capability_noaudit() to prevent unnecessary selinux audit messages. > > > Turns out that both calls have slightly different semantics even though > > > their documentation seems similar. Where in a nutshell: > > > > > > capable() - Tests the task's effective credentials > > > has_ns_capability_noaudit() - Tests the task's real credentials > > > > Eww.. > > Yeah, that's a catch. > > > > This most of the time has no practical difference but in some cases like > > > changing attrs (specifically group id in this case) through a NFS client > > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively > > > bypassing quota accounting checks. > > > > Yeah, this does look wrong. Do the other conversion in the above commit > > have tthe same issue? > > > > > Using instead ns_capable_noaudit() should fix this issue and prevent > > > selinux audit messages. > > > > The generic quota code manages to do without either has_capability_noaudit > > or ns_capable_noaudit. I think this might be hidden behind > > inode_owner_or_capable calls. Any idea why we're different? > > Actually no. Generic quota code has equivalent checks in ignore_hardlimit() > function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why > nobody complained about generic quota code is that we call > ignore_hardlimit() only if we are above hardlimit whereas XFS calls this > for every transaction... I guess we should aim for the same to avoid the spurious audit logs. I.e. xfs_trans_alloc_ichange is currently always called either with force = true or force = this capable check. So as a first step we can move the check into xfs_trans_alloc_ichange for the !force case, and the propagate that through XFS_QMOPT_FORCE_RES into xfs_trans_dqresv, i.e. only set XFS_QMOPT_FORCE_RES for the real forced case and instead have the capable check down in xfs_trans_dqresv. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize 2026-06-25 12:37 ` Christoph Hellwig @ 2026-06-25 15:00 ` Carlos Maiolino 2026-06-25 16:03 ` Darrick J. Wong 0 siblings, 1 reply; 8+ messages in thread From: Carlos Maiolino @ 2026-06-25 15:00 UTC (permalink / raw) To: Christoph Hellwig Cc: Jan Kara, linux-xfs, stable, Darrick J. Wong, Dave Chinner, Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel, Christian Brauner On Thu, Jun 25, 2026 at 02:37:54PM +0200, Christoph Hellwig wrote: > On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote: > > On Wed 24-06-26 15:40:39, Christoph Hellwig wrote: > > > Adding Jan and Christian for quota and user_ns knowledge. > > > > > > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote: > > > > From: Carlos Maiolino <cem@kernel.org> > > > > > > > > An user reported a bug where he managed to evade group's quota > > > > by changing a file's gid to a different group id the same user > > > > belonged to, even though quotas were enforced on both gids and the > > > > file's size was big enough to exceed the quota's hardlimit. > > > > > > > > Commit eba0549bc7d1 replaced a capable() call by a > > > > has_capability_noaudit() to prevent unnecessary selinux audit messages. > > > > Turns out that both calls have slightly different semantics even though > > > > their documentation seems similar. Where in a nutshell: > > > > > > > > capable() - Tests the task's effective credentials > > > > has_ns_capability_noaudit() - Tests the task's real credentials > > > > > > Eww.. > > > > Yeah, that's a catch. > > > > > > This most of the time has no practical difference but in some cases like > > > > changing attrs (specifically group id in this case) through a NFS client > > > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively > > > > bypassing quota accounting checks. > > > > > > Yeah, this does look wrong. Do the other conversion in the above commit > > > have tthe same issue? > > > > > > > Using instead ns_capable_noaudit() should fix this issue and prevent > > > > selinux audit messages. > > > > > > The generic quota code manages to do without either has_capability_noaudit > > > or ns_capable_noaudit. I think this might be hidden behind > > > inode_owner_or_capable calls. Any idea why we're different? > > > > Actually no. Generic quota code has equivalent checks in ignore_hardlimit() > > function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why > > nobody complained about generic quota code is that we call > > ignore_hardlimit() only if we are above hardlimit whereas XFS calls this > > for every transaction... > > I guess we should aim for the same to avoid the spurious audit logs. > > I.e. xfs_trans_alloc_ichange is currently always called either with > force = true or force = this capable check. So as a first step we can > move the check into xfs_trans_alloc_ichange for the !force case, and the > propagate that through XFS_QMOPT_FORCE_RES into xfs_trans_dqresv, i.e. > only set XFS_QMOPT_FORCE_RES for the real forced case and instead > have the capable check down in xfs_trans_dqresv. > Sounds fair, I'll give it a try tomorrow. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize 2026-06-25 15:00 ` Carlos Maiolino @ 2026-06-25 16:03 ` Darrick J. Wong 2026-06-26 7:34 ` Carlos Maiolino 0 siblings, 1 reply; 8+ messages in thread From: Darrick J. Wong @ 2026-06-25 16:03 UTC (permalink / raw) To: Carlos Maiolino Cc: Christoph Hellwig, Jan Kara, linux-xfs, stable, Dave Chinner, Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel, Christian Brauner On Thu, Jun 25, 2026 at 05:00:21PM +0200, Carlos Maiolino wrote: > On Thu, Jun 25, 2026 at 02:37:54PM +0200, Christoph Hellwig wrote: > > On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote: > > > On Wed 24-06-26 15:40:39, Christoph Hellwig wrote: > > > > Adding Jan and Christian for quota and user_ns knowledge. > > > > > > > > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote: > > > > > From: Carlos Maiolino <cem@kernel.org> > > > > > > > > > > An user reported a bug where he managed to evade group's quota > > > > > by changing a file's gid to a different group id the same user > > > > > belonged to, even though quotas were enforced on both gids and the > > > > > file's size was big enough to exceed the quota's hardlimit. > > > > > > > > > > Commit eba0549bc7d1 replaced a capable() call by a > > > > > has_capability_noaudit() to prevent unnecessary selinux audit messages. > > > > > Turns out that both calls have slightly different semantics even though > > > > > their documentation seems similar. Where in a nutshell: > > > > > > > > > > capable() - Tests the task's effective credentials > > > > > has_ns_capability_noaudit() - Tests the task's real credentials > > > > > > > > Eww.. > > > > > > Yeah, that's a catch. I spent a while trying to figure out how I went wrong in selecting the function name: * capable - Determine if the current task has a superior capability in effect * @cap: The capability to be tested for * * Return true if the current task has the given superior capability currently * available for use, false if not. vs. * has_capability_noaudit - Does a task have a capability (unaudited) in the * initial user ns * @t: The task in question * @cap: The capability to be tested for * * Return true if the specified task has the given superior capability * currently in effect to init_user_ns, false if not. Don't write an * audit message for the check. vs. * ns_capable_noaudit - Determine if the current task has a superior capability * (unaudited) in effect * @ns: The usernamespace we want the capability in * @cap: The capability to be tested for * * Return true if the current task has the given superior capability currently * available for use, false if not. All three of these sound the same to me. But what about the call path? capable -> ns_capable -> ns_capable_common has_capability_noaudit -> has_ns_capability_noaudit ns_capable_noaudit -> ns_capable_common Hum, maybe that's the difference -- ns_capable_common vs. has_ns_capability_noaudit? ns_capable_common calls security_capable() with current_cred(), whereas has_ns_capability_noaudit calls it with __task_cred(t), which is @current in the xfs_trans_alloc_ichange case. Aha! current_cred is current->cred, whereas __task_cred(current) is current->real_cred, and these aren't the same thing: /* Objective and real subjective task credentials (COW): */ const struct cred __rcu *real_cred; /* Effective (overridable) subjective task credentials (COW): */ const struct cred __rcu *cred; So I guess ns_capable_common (and wrappers) are testing the process' effective credentials, whereas has_ns_capability_noaudit is testing the process' real credentials? It would have been *really* nice if the documentation for those functions mentioned that distinction! > > > > > This most of the time has no practical difference but in some cases like > > > > > changing attrs (specifically group id in this case) through a NFS client > > > > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively > > > > > bypassing quota accounting checks. > > > > > > > > Yeah, this does look wrong. Do the other conversion in the above commit > > > > have tthe same issue? > > > > > > > > > Using instead ns_capable_noaudit() should fix this issue and prevent > > > > > selinux audit messages. > > > > > > > > The generic quota code manages to do without either has_capability_noaudit > > > > or ns_capable_noaudit. I think this might be hidden behind > > > > inode_owner_or_capable calls. Any idea why we're different? > > > > > > Actually no. Generic quota code has equivalent checks in ignore_hardlimit() > > > function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why > > > nobody complained about generic quota code is that we call > > > ignore_hardlimit() only if we are above hardlimit whereas XFS calls this > > > for every transaction... > > > > I guess we should aim for the same to avoid the spurious audit logs. > > > > I.e. xfs_trans_alloc_ichange is currently always called either with > > force = true or force = this capable check. So as a first step we can > > move the check into xfs_trans_alloc_ichange for the !force case, and the > > propagate that through XFS_QMOPT_FORCE_RES into xfs_trans_dqresv, i.e. > > only set XFS_QMOPT_FORCE_RES for the real forced case and instead > > have the capable check down in xfs_trans_dqresv. > > > > Sounds fair, I'll give it a try tomorrow. So yeah, I agree we should change that to: ns_capable_noaudit(&init_user_ns, CAP_FOWNER) Though it's also weird that XFS gates it on CAP_FOWNER whereas the VFS checks CAP_SYS_RESOURCE. Though I would have added this: static inline bool current_may_ignore_quota_limits(void) { /* * If the current process' effective credentials include * CAP_FOWNER, then they're allowed to ignore the hard limit. */ return ns_capable_noaudit(&init_user_ns, CAP_FOWNER); } and then changed the callsites in xfs_ioctl/xfs_iops.c to: error = xfs_trans_alloc_ichange(ip, udqp, gdqp, NULL, current_may_ignore_quota_limits(), &tp); The has_capability_noaudit call in xfs_fsmap.c should change to ns_capable_noaudit. --D ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize 2026-06-25 16:03 ` Darrick J. Wong @ 2026-06-26 7:34 ` Carlos Maiolino 0 siblings, 0 replies; 8+ messages in thread From: Carlos Maiolino @ 2026-06-26 7:34 UTC (permalink / raw) To: Darrick J. Wong Cc: Christoph Hellwig, Jan Kara, linux-xfs, stable, Dave Chinner, Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel, Christian Brauner On Thu, Jun 25, 2026 at 09:03:17AM -0700, Darrick J. Wong wrote: > On Thu, Jun 25, 2026 at 05:00:21PM +0200, Carlos Maiolino wrote: > > On Thu, Jun 25, 2026 at 02:37:54PM +0200, Christoph Hellwig wrote: > > > On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote: > > > > On Wed 24-06-26 15:40:39, Christoph Hellwig wrote: > > > > > Adding Jan and Christian for quota and user_ns knowledge. > > > > > > > > > > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote: > > > > > > From: Carlos Maiolino <cem@kernel.org> > > > > > > > > > > > > An user reported a bug where he managed to evade group's quota > > > > > > by changing a file's gid to a different group id the same user > > > > > > belonged to, even though quotas were enforced on both gids and the > > > > > > file's size was big enough to exceed the quota's hardlimit. > > > > > > > > > > > > Commit eba0549bc7d1 replaced a capable() call by a > > > > > > has_capability_noaudit() to prevent unnecessary selinux audit messages. > > > > > > Turns out that both calls have slightly different semantics even though > > > > > > their documentation seems similar. Where in a nutshell: > > > > > > > > > > > > capable() - Tests the task's effective credentials > > > > > > has_ns_capability_noaudit() - Tests the task's real credentials > > > > > > > > > > Eww.. > > > > > > > > Yeah, that's a catch. > > I spent a while trying to figure out how I went wrong in selecting the > function name: > > * capable - Determine if the current task has a superior capability in effect > * @cap: The capability to be tested for > * > * Return true if the current task has the given superior capability currently > * available for use, false if not. > > vs. > > * has_capability_noaudit - Does a task have a capability (unaudited) in the > * initial user ns > * @t: The task in question > * @cap: The capability to be tested for > * > * Return true if the specified task has the given superior capability > * currently in effect to init_user_ns, false if not. Don't write an > * audit message for the check. > > vs. > > * ns_capable_noaudit - Determine if the current task has a superior capability > * (unaudited) in effect > * @ns: The usernamespace we want the capability in > * @cap: The capability to be tested for > * > * Return true if the current task has the given superior capability currently > * available for use, false if not. > > All three of these sound the same to me. But what about the call path? Yup, I reached the same conclusion, they are all the same, so I decided to look at the implementation... > > capable -> ns_capable -> ns_capable_common > > has_capability_noaudit -> has_ns_capability_noaudit > > ns_capable_noaudit -> ns_capable_common > > Hum, maybe that's the difference -- ns_capable_common vs. > has_ns_capability_noaudit? > > ns_capable_common calls security_capable() with current_cred(), whereas > has_ns_capability_noaudit calls it with __task_cred(t), which is > @current in the xfs_trans_alloc_ichange case. > > Aha! current_cred is current->cred, whereas __task_cred(current) is > current->real_cred, and these aren't the same thing: > > /* Objective and real subjective task credentials (COW): */ > const struct cred __rcu *real_cred; > > /* Effective (overridable) subjective task credentials (COW): */ > const struct cred __rcu *cred; > > So I guess ns_capable_common (and wrappers) are testing the process' > effective credentials, whereas has_ns_capability_noaudit is testing the > process' real credentials? > > It would have been *really* nice if the documentation for those > functions mentioned that distinction! Yup, this is a much more detailed versions from my same findings :) > > > > > > > This most of the time has no practical difference but in some cases like > > > > > > changing attrs (specifically group id in this case) through a NFS client > > > > > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively > > > > > > bypassing quota accounting checks. > > > > > > > > > > Yeah, this does look wrong. Do the other conversion in the above commit > > > > > have tthe same issue? > > > > > > > > > > > Using instead ns_capable_noaudit() should fix this issue and prevent > > > > > > selinux audit messages. > > > > > > > > > > The generic quota code manages to do without either has_capability_noaudit > > > > > or ns_capable_noaudit. I think this might be hidden behind > > > > > inode_owner_or_capable calls. Any idea why we're different? > > > > > > > > Actually no. Generic quota code has equivalent checks in ignore_hardlimit() > > > > function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why > > > > nobody complained about generic quota code is that we call > > > > ignore_hardlimit() only if we are above hardlimit whereas XFS calls this > > > > for every transaction... > > > > > > I guess we should aim for the same to avoid the spurious audit logs. > > > > > > I.e. xfs_trans_alloc_ichange is currently always called either with > > > force = true or force = this capable check. So as a first step we can > > > move the check into xfs_trans_alloc_ichange for the !force case, and the > > > propagate that through XFS_QMOPT_FORCE_RES into xfs_trans_dqresv, i.e. > > > only set XFS_QMOPT_FORCE_RES for the real forced case and instead > > > have the capable check down in xfs_trans_dqresv. > > > > > > > Sounds fair, I'll give it a try tomorrow. > > So yeah, I agree we should change that to: > > ns_capable_noaudit(&init_user_ns, CAP_FOWNER) > > Though it's also weird that XFS gates it on CAP_FOWNER whereas the VFS > checks CAP_SYS_RESOURCE. Though I would have added this: Same to me, I think there is no real reason for us to be using CAP_FOWNER? I thought about changing it to CAP_SYS_RESOURCE but honestly I'd avoid doing that for LTS and just stick with CAP_FOWNER initially. > > static inline bool > current_may_ignore_quota_limits(void) > { > /* > * If the current process' effective credentials include > * CAP_FOWNER, then they're allowed to ignore the hard limit. > */ > return ns_capable_noaudit(&init_user_ns, CAP_FOWNER); > } > > and then changed the callsites in xfs_ioctl/xfs_iops.c to: > > error = xfs_trans_alloc_ichange(ip, udqp, gdqp, NULL, > current_may_ignore_quota_limits(), &tp); > > The has_capability_noaudit call in xfs_fsmap.c should change to > ns_capable_noaudit. I'm working on a version now adding some bits hch suggested. At the end I think we should add a new helper in kernel/capabililty, but I want to avoid doing that now so LTS can cleanly backport a fix. Then we can update it. I'm working on both things in parallel and see how it goes. I'll Cc you folks. > > --D > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-06-26 7:34 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260624101436.362533-1-cem@kernel.org>
2026-06-24 13:40 ` [PATCH] xfs: fix capabily check in xfs_setattr_nonsize Christoph Hellwig
2026-06-24 18:59 ` Carlos Maiolino
2026-06-25 10:43 ` Jan Kara
2026-06-25 11:01 ` Carlos Maiolino
2026-06-25 12:37 ` Christoph Hellwig
2026-06-25 15:00 ` Carlos Maiolino
2026-06-25 16:03 ` Darrick J. Wong
2026-06-26 7:34 ` Carlos Maiolino
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox