* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
[not found] <20260624101436.362533-1-cem@kernel.org>
@ 2026-06-24 13:40 ` Christoph Hellwig
2026-06-24 18:59 ` Carlos Maiolino
2026-06-25 10:43 ` Jan Kara
0 siblings, 2 replies; 8+ messages in thread
From: Christoph Hellwig @ 2026-06-24 13:40 UTC (permalink / raw)
To: cem
Cc: linux-xfs, stable, Darrick J. Wong, Dave Chinner, Eric Sandeen,
Christoph Hellwig, Dr. Thomas Orgis, Jan Kara, linux-fsdevel,
Christian Brauner
Adding Jan and Christian for quota and user_ns knowledge.
On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> From: Carlos Maiolino <cem@kernel.org>
>
> An user reported a bug where he managed to evade group's quota
> by changing a file's gid to a different group id the same user
> belonged to, even though quotas were enforced on both gids and the
> file's size was big enough to exceed the quota's hardlimit.
>
> Commit eba0549bc7d1 replaced a capable() call by a
> has_capability_noaudit() to prevent unnecessary selinux audit messages.
> Turns out that both calls have slightly different semantics even though
> their documentation seems similar. Where in a nutshell:
>
> capable() - Tests the task's effective credentials
> has_ns_capability_noaudit() - Tests the task's real credentials
Eww..
> This most of the time has no practical difference but in some cases like
> changing attrs (specifically group id in this case) through a NFS client
> this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> bypassing quota accounting checks.
Yeah, this does look wrong. Do the other conversion in the above commit
have tthe same issue?
> Using instead ns_capable_noaudit() should fix this issue and prevent
> selinux audit messages.
The generic quota code manages to do without either has_capability_noaudit
or ns_capable_noaudit. I think this might be hidden behind
inode_owner_or_capable calls. Any idea why we're different?
> + bool force = ns_capable_noaudit(&init_user_ns,
> + CAP_FOWNER);
I'd do away with the local variable here.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
2026-06-24 13:40 ` [PATCH] xfs: fix capabily check in xfs_setattr_nonsize Christoph Hellwig
@ 2026-06-24 18:59 ` Carlos Maiolino
2026-06-25 10:43 ` Jan Kara
1 sibling, 0 replies; 8+ messages in thread
From: Carlos Maiolino @ 2026-06-24 18:59 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-xfs, stable, Darrick J. Wong, Dave Chinner, Eric Sandeen,
Dr. Thomas Orgis, Jan Kara, linux-fsdevel, Christian Brauner
On Wed, Jun 24, 2026 at 03:40:39PM +0200, Christoph Hellwig wrote:
> Adding Jan and Christian for quota and user_ns knowledge.
>
> On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > From: Carlos Maiolino <cem@kernel.org>
> >
> > An user reported a bug where he managed to evade group's quota
> > by changing a file's gid to a different group id the same user
> > belonged to, even though quotas were enforced on both gids and the
> > file's size was big enough to exceed the quota's hardlimit.
> >
> > Commit eba0549bc7d1 replaced a capable() call by a
> > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > Turns out that both calls have slightly different semantics even though
> > their documentation seems similar. Where in a nutshell:
> >
> > capable() - Tests the task's effective credentials
> > has_ns_capability_noaudit() - Tests the task's real credentials
>
> Eww..
>
> > This most of the time has no practical difference but in some cases like
> > changing attrs (specifically group id in this case) through a NFS client
> > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > bypassing quota accounting checks.
>
> Yeah, this does look wrong. Do the other conversion in the above commit
> have tthe same issue?
I don't know yet, but I do hope to be able to convert everything back to
a capable_noaudit() helper. I need some time to look at the other
conversions, but giving the different credentials likely the are all
victim of the same issue.
>
> > Using instead ns_capable_noaudit() should fix this issue and prevent
> > selinux audit messages.
>
> The generic quota code manages to do without either has_capability_noaudit
> or ns_capable_noaudit. I think this might be hidden behind
> inode_owner_or_capable calls. Any idea why we're different?
Yes, I looked into that to check the feasibility to use the generic
code. This specific path goes through __dquot_transfer() which uses
ignore_hardlimit() helper to decide between bypassing quota enforcement
or not.
The later though uses capable(CAP_SYS_RESOURCE) which would trigger
selinux audit messages too at least for our case.
This though made me wonder if we shouldn't be using CAP_SYS_RESOURCE in
lieu of CAP_FOWNER in this path which seems more appropriate to bypass
quota enforcement. Either way though, I think we should at least have an
initial simple to backport patch to older releases.
>
> > + bool force = ns_capable_noaudit(&init_user_ns,
> > + CAP_FOWNER);
>
> I'd do away with the local variable here.
Sounds find to me. I wanted to avoid the function call with two arguments
as an argument. But I honestly have no strong preference.
Cheers!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
2026-06-24 13:40 ` [PATCH] xfs: fix capabily check in xfs_setattr_nonsize Christoph Hellwig
2026-06-24 18:59 ` Carlos Maiolino
@ 2026-06-25 10:43 ` Jan Kara
2026-06-25 11:01 ` Carlos Maiolino
2026-06-25 12:37 ` Christoph Hellwig
1 sibling, 2 replies; 8+ messages in thread
From: Jan Kara @ 2026-06-25 10:43 UTC (permalink / raw)
To: Christoph Hellwig
Cc: cem, linux-xfs, stable, Darrick J. Wong, Dave Chinner,
Eric Sandeen, Dr. Thomas Orgis, Jan Kara, linux-fsdevel,
Christian Brauner
On Wed 24-06-26 15:40:39, Christoph Hellwig wrote:
> Adding Jan and Christian for quota and user_ns knowledge.
>
> On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > From: Carlos Maiolino <cem@kernel.org>
> >
> > An user reported a bug where he managed to evade group's quota
> > by changing a file's gid to a different group id the same user
> > belonged to, even though quotas were enforced on both gids and the
> > file's size was big enough to exceed the quota's hardlimit.
> >
> > Commit eba0549bc7d1 replaced a capable() call by a
> > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > Turns out that both calls have slightly different semantics even though
> > their documentation seems similar. Where in a nutshell:
> >
> > capable() - Tests the task's effective credentials
> > has_ns_capability_noaudit() - Tests the task's real credentials
>
> Eww..
Yeah, that's a catch.
> > This most of the time has no practical difference but in some cases like
> > changing attrs (specifically group id in this case) through a NFS client
> > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > bypassing quota accounting checks.
>
> Yeah, this does look wrong. Do the other conversion in the above commit
> have tthe same issue?
>
> > Using instead ns_capable_noaudit() should fix this issue and prevent
> > selinux audit messages.
>
> The generic quota code manages to do without either has_capability_noaudit
> or ns_capable_noaudit. I think this might be hidden behind
> inode_owner_or_capable calls. Any idea why we're different?
Actually no. Generic quota code has equivalent checks in ignore_hardlimit()
function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why
nobody complained about generic quota code is that we call
ignore_hardlimit() only if we are above hardlimit whereas XFS calls this
for every transaction...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
2026-06-25 10:43 ` Jan Kara
@ 2026-06-25 11:01 ` Carlos Maiolino
2026-06-25 12:37 ` Christoph Hellwig
1 sibling, 0 replies; 8+ messages in thread
From: Carlos Maiolino @ 2026-06-25 11:01 UTC (permalink / raw)
To: Jan Kara
Cc: Christoph Hellwig, linux-xfs, stable, Darrick J. Wong,
Dave Chinner, Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel,
Christian Brauner
On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote:
> On Wed 24-06-26 15:40:39, Christoph Hellwig wrote:
> > Adding Jan and Christian for quota and user_ns knowledge.
> >
> > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > > From: Carlos Maiolino <cem@kernel.org>
> > >
> > > An user reported a bug where he managed to evade group's quota
> > > by changing a file's gid to a different group id the same user
> > > belonged to, even though quotas were enforced on both gids and the
> > > file's size was big enough to exceed the quota's hardlimit.
> > >
> > > Commit eba0549bc7d1 replaced a capable() call by a
> > > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > > Turns out that both calls have slightly different semantics even though
> > > their documentation seems similar. Where in a nutshell:
> > >
> > > capable() - Tests the task's effective credentials
> > > has_ns_capability_noaudit() - Tests the task's real credentials
> >
> > Eww..
>
> Yeah, that's a catch.
>
> > > This most of the time has no practical difference but in some cases like
> > > changing attrs (specifically group id in this case) through a NFS client
> > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > > bypassing quota accounting checks.
> >
> > Yeah, this does look wrong. Do the other conversion in the above commit
> > have tthe same issue?
> >
> > > Using instead ns_capable_noaudit() should fix this issue and prevent
> > > selinux audit messages.
> >
> > The generic quota code manages to do without either has_capability_noaudit
> > or ns_capable_noaudit. I think this might be hidden behind
> > inode_owner_or_capable calls. Any idea why we're different?
>
> Actually no. Generic quota code has equivalent checks in ignore_hardlimit()
> function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why
> nobody complained about generic quota code is that we call
> ignore_hardlimit() only if we are above hardlimit whereas XFS calls this
> for every transaction...
But the capable() call works fine, it's the replacement by
has_capability_noaudit() that enables quota bypass I *think* generic
quota code is safe from this point.
>
> Honza
>
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
2026-06-25 10:43 ` Jan Kara
2026-06-25 11:01 ` Carlos Maiolino
@ 2026-06-25 12:37 ` Christoph Hellwig
2026-06-25 15:00 ` Carlos Maiolino
1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2026-06-25 12:37 UTC (permalink / raw)
To: Jan Kara
Cc: Christoph Hellwig, cem, linux-xfs, stable, Darrick J. Wong,
Dave Chinner, Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel,
Christian Brauner
On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote:
> On Wed 24-06-26 15:40:39, Christoph Hellwig wrote:
> > Adding Jan and Christian for quota and user_ns knowledge.
> >
> > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > > From: Carlos Maiolino <cem@kernel.org>
> > >
> > > An user reported a bug where he managed to evade group's quota
> > > by changing a file's gid to a different group id the same user
> > > belonged to, even though quotas were enforced on both gids and the
> > > file's size was big enough to exceed the quota's hardlimit.
> > >
> > > Commit eba0549bc7d1 replaced a capable() call by a
> > > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > > Turns out that both calls have slightly different semantics even though
> > > their documentation seems similar. Where in a nutshell:
> > >
> > > capable() - Tests the task's effective credentials
> > > has_ns_capability_noaudit() - Tests the task's real credentials
> >
> > Eww..
>
> Yeah, that's a catch.
>
> > > This most of the time has no practical difference but in some cases like
> > > changing attrs (specifically group id in this case) through a NFS client
> > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > > bypassing quota accounting checks.
> >
> > Yeah, this does look wrong. Do the other conversion in the above commit
> > have tthe same issue?
> >
> > > Using instead ns_capable_noaudit() should fix this issue and prevent
> > > selinux audit messages.
> >
> > The generic quota code manages to do without either has_capability_noaudit
> > or ns_capable_noaudit. I think this might be hidden behind
> > inode_owner_or_capable calls. Any idea why we're different?
>
> Actually no. Generic quota code has equivalent checks in ignore_hardlimit()
> function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why
> nobody complained about generic quota code is that we call
> ignore_hardlimit() only if we are above hardlimit whereas XFS calls this
> for every transaction...
I guess we should aim for the same to avoid the spurious audit logs.
I.e. xfs_trans_alloc_ichange is currently always called either with
force = true or force = this capable check. So as a first step we can
move the check into xfs_trans_alloc_ichange for the !force case, and the
propagate that through XFS_QMOPT_FORCE_RES into xfs_trans_dqresv, i.e.
only set XFS_QMOPT_FORCE_RES for the real forced case and instead
have the capable check down in xfs_trans_dqresv.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
2026-06-25 12:37 ` Christoph Hellwig
@ 2026-06-25 15:00 ` Carlos Maiolino
2026-06-25 16:03 ` Darrick J. Wong
0 siblings, 1 reply; 8+ messages in thread
From: Carlos Maiolino @ 2026-06-25 15:00 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, linux-xfs, stable, Darrick J. Wong, Dave Chinner,
Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel, Christian Brauner
On Thu, Jun 25, 2026 at 02:37:54PM +0200, Christoph Hellwig wrote:
> On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote:
> > On Wed 24-06-26 15:40:39, Christoph Hellwig wrote:
> > > Adding Jan and Christian for quota and user_ns knowledge.
> > >
> > > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > > > From: Carlos Maiolino <cem@kernel.org>
> > > >
> > > > An user reported a bug where he managed to evade group's quota
> > > > by changing a file's gid to a different group id the same user
> > > > belonged to, even though quotas were enforced on both gids and the
> > > > file's size was big enough to exceed the quota's hardlimit.
> > > >
> > > > Commit eba0549bc7d1 replaced a capable() call by a
> > > > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > > > Turns out that both calls have slightly different semantics even though
> > > > their documentation seems similar. Where in a nutshell:
> > > >
> > > > capable() - Tests the task's effective credentials
> > > > has_ns_capability_noaudit() - Tests the task's real credentials
> > >
> > > Eww..
> >
> > Yeah, that's a catch.
> >
> > > > This most of the time has no practical difference but in some cases like
> > > > changing attrs (specifically group id in this case) through a NFS client
> > > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > > > bypassing quota accounting checks.
> > >
> > > Yeah, this does look wrong. Do the other conversion in the above commit
> > > have tthe same issue?
> > >
> > > > Using instead ns_capable_noaudit() should fix this issue and prevent
> > > > selinux audit messages.
> > >
> > > The generic quota code manages to do without either has_capability_noaudit
> > > or ns_capable_noaudit. I think this might be hidden behind
> > > inode_owner_or_capable calls. Any idea why we're different?
> >
> > Actually no. Generic quota code has equivalent checks in ignore_hardlimit()
> > function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why
> > nobody complained about generic quota code is that we call
> > ignore_hardlimit() only if we are above hardlimit whereas XFS calls this
> > for every transaction...
>
> I guess we should aim for the same to avoid the spurious audit logs.
>
> I.e. xfs_trans_alloc_ichange is currently always called either with
> force = true or force = this capable check. So as a first step we can
> move the check into xfs_trans_alloc_ichange for the !force case, and the
> propagate that through XFS_QMOPT_FORCE_RES into xfs_trans_dqresv, i.e.
> only set XFS_QMOPT_FORCE_RES for the real forced case and instead
> have the capable check down in xfs_trans_dqresv.
>
Sounds fair, I'll give it a try tomorrow.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
2026-06-25 15:00 ` Carlos Maiolino
@ 2026-06-25 16:03 ` Darrick J. Wong
2026-06-26 7:34 ` Carlos Maiolino
0 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2026-06-25 16:03 UTC (permalink / raw)
To: Carlos Maiolino
Cc: Christoph Hellwig, Jan Kara, linux-xfs, stable, Dave Chinner,
Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel, Christian Brauner
On Thu, Jun 25, 2026 at 05:00:21PM +0200, Carlos Maiolino wrote:
> On Thu, Jun 25, 2026 at 02:37:54PM +0200, Christoph Hellwig wrote:
> > On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote:
> > > On Wed 24-06-26 15:40:39, Christoph Hellwig wrote:
> > > > Adding Jan and Christian for quota and user_ns knowledge.
> > > >
> > > > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > > > > From: Carlos Maiolino <cem@kernel.org>
> > > > >
> > > > > An user reported a bug where he managed to evade group's quota
> > > > > by changing a file's gid to a different group id the same user
> > > > > belonged to, even though quotas were enforced on both gids and the
> > > > > file's size was big enough to exceed the quota's hardlimit.
> > > > >
> > > > > Commit eba0549bc7d1 replaced a capable() call by a
> > > > > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > > > > Turns out that both calls have slightly different semantics even though
> > > > > their documentation seems similar. Where in a nutshell:
> > > > >
> > > > > capable() - Tests the task's effective credentials
> > > > > has_ns_capability_noaudit() - Tests the task's real credentials
> > > >
> > > > Eww..
> > >
> > > Yeah, that's a catch.
I spent a while trying to figure out how I went wrong in selecting the
function name:
* capable - Determine if the current task has a superior capability in effect
* @cap: The capability to be tested for
*
* Return true if the current task has the given superior capability currently
* available for use, false if not.
vs.
* has_capability_noaudit - Does a task have a capability (unaudited) in the
* initial user ns
* @t: The task in question
* @cap: The capability to be tested for
*
* Return true if the specified task has the given superior capability
* currently in effect to init_user_ns, false if not. Don't write an
* audit message for the check.
vs.
* ns_capable_noaudit - Determine if the current task has a superior capability
* (unaudited) in effect
* @ns: The usernamespace we want the capability in
* @cap: The capability to be tested for
*
* Return true if the current task has the given superior capability currently
* available for use, false if not.
All three of these sound the same to me. But what about the call path?
capable -> ns_capable -> ns_capable_common
has_capability_noaudit -> has_ns_capability_noaudit
ns_capable_noaudit -> ns_capable_common
Hum, maybe that's the difference -- ns_capable_common vs.
has_ns_capability_noaudit?
ns_capable_common calls security_capable() with current_cred(), whereas
has_ns_capability_noaudit calls it with __task_cred(t), which is
@current in the xfs_trans_alloc_ichange case.
Aha! current_cred is current->cred, whereas __task_cred(current) is
current->real_cred, and these aren't the same thing:
/* Objective and real subjective task credentials (COW): */
const struct cred __rcu *real_cred;
/* Effective (overridable) subjective task credentials (COW): */
const struct cred __rcu *cred;
So I guess ns_capable_common (and wrappers) are testing the process'
effective credentials, whereas has_ns_capability_noaudit is testing the
process' real credentials?
It would have been *really* nice if the documentation for those
functions mentioned that distinction!
> > > > > This most of the time has no practical difference but in some cases like
> > > > > changing attrs (specifically group id in this case) through a NFS client
> > > > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > > > > bypassing quota accounting checks.
> > > >
> > > > Yeah, this does look wrong. Do the other conversion in the above commit
> > > > have tthe same issue?
> > > >
> > > > > Using instead ns_capable_noaudit() should fix this issue and prevent
> > > > > selinux audit messages.
> > > >
> > > > The generic quota code manages to do without either has_capability_noaudit
> > > > or ns_capable_noaudit. I think this might be hidden behind
> > > > inode_owner_or_capable calls. Any idea why we're different?
> > >
> > > Actually no. Generic quota code has equivalent checks in ignore_hardlimit()
> > > function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why
> > > nobody complained about generic quota code is that we call
> > > ignore_hardlimit() only if we are above hardlimit whereas XFS calls this
> > > for every transaction...
> >
> > I guess we should aim for the same to avoid the spurious audit logs.
> >
> > I.e. xfs_trans_alloc_ichange is currently always called either with
> > force = true or force = this capable check. So as a first step we can
> > move the check into xfs_trans_alloc_ichange for the !force case, and the
> > propagate that through XFS_QMOPT_FORCE_RES into xfs_trans_dqresv, i.e.
> > only set XFS_QMOPT_FORCE_RES for the real forced case and instead
> > have the capable check down in xfs_trans_dqresv.
> >
>
> Sounds fair, I'll give it a try tomorrow.
So yeah, I agree we should change that to:
ns_capable_noaudit(&init_user_ns, CAP_FOWNER)
Though it's also weird that XFS gates it on CAP_FOWNER whereas the VFS
checks CAP_SYS_RESOURCE. Though I would have added this:
static inline bool
current_may_ignore_quota_limits(void)
{
/*
* If the current process' effective credentials include
* CAP_FOWNER, then they're allowed to ignore the hard limit.
*/
return ns_capable_noaudit(&init_user_ns, CAP_FOWNER);
}
and then changed the callsites in xfs_ioctl/xfs_iops.c to:
error = xfs_trans_alloc_ichange(ip, udqp, gdqp, NULL,
current_may_ignore_quota_limits(), &tp);
The has_capability_noaudit call in xfs_fsmap.c should change to
ns_capable_noaudit.
--D
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
2026-06-25 16:03 ` Darrick J. Wong
@ 2026-06-26 7:34 ` Carlos Maiolino
0 siblings, 0 replies; 8+ messages in thread
From: Carlos Maiolino @ 2026-06-26 7:34 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, Jan Kara, linux-xfs, stable, Dave Chinner,
Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel, Christian Brauner
On Thu, Jun 25, 2026 at 09:03:17AM -0700, Darrick J. Wong wrote:
> On Thu, Jun 25, 2026 at 05:00:21PM +0200, Carlos Maiolino wrote:
> > On Thu, Jun 25, 2026 at 02:37:54PM +0200, Christoph Hellwig wrote:
> > > On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote:
> > > > On Wed 24-06-26 15:40:39, Christoph Hellwig wrote:
> > > > > Adding Jan and Christian for quota and user_ns knowledge.
> > > > >
> > > > > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > > > > > From: Carlos Maiolino <cem@kernel.org>
> > > > > >
> > > > > > An user reported a bug where he managed to evade group's quota
> > > > > > by changing a file's gid to a different group id the same user
> > > > > > belonged to, even though quotas were enforced on both gids and the
> > > > > > file's size was big enough to exceed the quota's hardlimit.
> > > > > >
> > > > > > Commit eba0549bc7d1 replaced a capable() call by a
> > > > > > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > > > > > Turns out that both calls have slightly different semantics even though
> > > > > > their documentation seems similar. Where in a nutshell:
> > > > > >
> > > > > > capable() - Tests the task's effective credentials
> > > > > > has_ns_capability_noaudit() - Tests the task's real credentials
> > > > >
> > > > > Eww..
> > > >
> > > > Yeah, that's a catch.
>
> I spent a while trying to figure out how I went wrong in selecting the
> function name:
>
> * capable - Determine if the current task has a superior capability in effect
> * @cap: The capability to be tested for
> *
> * Return true if the current task has the given superior capability currently
> * available for use, false if not.
>
> vs.
>
> * has_capability_noaudit - Does a task have a capability (unaudited) in the
> * initial user ns
> * @t: The task in question
> * @cap: The capability to be tested for
> *
> * Return true if the specified task has the given superior capability
> * currently in effect to init_user_ns, false if not. Don't write an
> * audit message for the check.
>
> vs.
>
> * ns_capable_noaudit - Determine if the current task has a superior capability
> * (unaudited) in effect
> * @ns: The usernamespace we want the capability in
> * @cap: The capability to be tested for
> *
> * Return true if the current task has the given superior capability currently
> * available for use, false if not.
>
> All three of these sound the same to me. But what about the call path?
Yup, I reached the same conclusion, they are all the same, so I decided
to look at the implementation...
>
> capable -> ns_capable -> ns_capable_common
>
> has_capability_noaudit -> has_ns_capability_noaudit
>
> ns_capable_noaudit -> ns_capable_common
>
> Hum, maybe that's the difference -- ns_capable_common vs.
> has_ns_capability_noaudit?
>
> ns_capable_common calls security_capable() with current_cred(), whereas
> has_ns_capability_noaudit calls it with __task_cred(t), which is
> @current in the xfs_trans_alloc_ichange case.
>
> Aha! current_cred is current->cred, whereas __task_cred(current) is
> current->real_cred, and these aren't the same thing:
>
> /* Objective and real subjective task credentials (COW): */
> const struct cred __rcu *real_cred;
>
> /* Effective (overridable) subjective task credentials (COW): */
> const struct cred __rcu *cred;
>
> So I guess ns_capable_common (and wrappers) are testing the process'
> effective credentials, whereas has_ns_capability_noaudit is testing the
> process' real credentials?
>
> It would have been *really* nice if the documentation for those
> functions mentioned that distinction!
Yup, this is a much more detailed versions from my same findings :)
>
> > > > > > This most of the time has no practical difference but in some cases like
> > > > > > changing attrs (specifically group id in this case) through a NFS client
> > > > > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > > > > > bypassing quota accounting checks.
> > > > >
> > > > > Yeah, this does look wrong. Do the other conversion in the above commit
> > > > > have tthe same issue?
> > > > >
> > > > > > Using instead ns_capable_noaudit() should fix this issue and prevent
> > > > > > selinux audit messages.
> > > > >
> > > > > The generic quota code manages to do without either has_capability_noaudit
> > > > > or ns_capable_noaudit. I think this might be hidden behind
> > > > > inode_owner_or_capable calls. Any idea why we're different?
> > > >
> > > > Actually no. Generic quota code has equivalent checks in ignore_hardlimit()
> > > > function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why
> > > > nobody complained about generic quota code is that we call
> > > > ignore_hardlimit() only if we are above hardlimit whereas XFS calls this
> > > > for every transaction...
> > >
> > > I guess we should aim for the same to avoid the spurious audit logs.
> > >
> > > I.e. xfs_trans_alloc_ichange is currently always called either with
> > > force = true or force = this capable check. So as a first step we can
> > > move the check into xfs_trans_alloc_ichange for the !force case, and the
> > > propagate that through XFS_QMOPT_FORCE_RES into xfs_trans_dqresv, i.e.
> > > only set XFS_QMOPT_FORCE_RES for the real forced case and instead
> > > have the capable check down in xfs_trans_dqresv.
> > >
> >
> > Sounds fair, I'll give it a try tomorrow.
>
> So yeah, I agree we should change that to:
>
> ns_capable_noaudit(&init_user_ns, CAP_FOWNER)
>
> Though it's also weird that XFS gates it on CAP_FOWNER whereas the VFS
> checks CAP_SYS_RESOURCE. Though I would have added this:
Same to me, I think there is no real reason for us to be using
CAP_FOWNER? I thought about changing it to CAP_SYS_RESOURCE but honestly
I'd avoid doing that for LTS and just stick with CAP_FOWNER initially.
>
> static inline bool
> current_may_ignore_quota_limits(void)
> {
> /*
> * If the current process' effective credentials include
> * CAP_FOWNER, then they're allowed to ignore the hard limit.
> */
> return ns_capable_noaudit(&init_user_ns, CAP_FOWNER);
> }
>
> and then changed the callsites in xfs_ioctl/xfs_iops.c to:
>
> error = xfs_trans_alloc_ichange(ip, udqp, gdqp, NULL,
> current_may_ignore_quota_limits(), &tp);
>
> The has_capability_noaudit call in xfs_fsmap.c should change to
> ns_capable_noaudit.
I'm working on a version now adding some bits hch suggested. At the end
I think we should add a new helper in kernel/capabililty, but I want to
avoid doing that now so LTS can cleanly backport a fix. Then we can
update it.
I'm working on both things in parallel and see how it goes. I'll Cc you
folks.
>
> --D
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-06-26 7:34 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260624101436.362533-1-cem@kernel.org>
2026-06-24 13:40 ` [PATCH] xfs: fix capabily check in xfs_setattr_nonsize Christoph Hellwig
2026-06-24 18:59 ` Carlos Maiolino
2026-06-25 10:43 ` Jan Kara
2026-06-25 11:01 ` Carlos Maiolino
2026-06-25 12:37 ` Christoph Hellwig
2026-06-25 15:00 ` Carlos Maiolino
2026-06-25 16:03 ` Darrick J. Wong
2026-06-26 7:34 ` Carlos Maiolino
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox