Re: [PATCH] xfs: fix capabily check in xfs_setattr

Linux filesystem development
 help / color / mirror / Atom feed

* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
       [not found] <20260624101436.362533-1-cem@kernel.org>
@ 2026-06-24 13:40 ` Christoph Hellwig
  2026-06-24 18:59   ` Carlos Maiolino
  2026-06-25 10:43   ` Jan Kara
  0 siblings, 2 replies; 8+ messages in thread
From: Christoph Hellwig @ 2026-06-24 13:40 UTC (permalink / raw)
  To: cem
  Cc: linux-xfs, stable, Darrick J. Wong, Dave Chinner, Eric Sandeen,
	Christoph Hellwig, Dr. Thomas Orgis, Jan Kara, linux-fsdevel,
	Christian Brauner

Adding Jan and Christian for quota and user_ns knowledge.

On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> From: Carlos Maiolino <cem@kernel.org>
> 
> An user reported a bug where he managed to evade group's quota
> by changing a file's gid to a different group id the same user
> belonged to, even though quotas were enforced on both gids and the
> file's size was big enough to exceed the quota's hardlimit.
> 
> Commit eba0549bc7d1 replaced a capable() call by a
> has_capability_noaudit() to prevent unnecessary selinux audit messages.
> Turns out that both calls have slightly different semantics even though
> their documentation seems similar. Where in a nutshell:
> 
> capable() - Tests the task's effective credentials
> has_ns_capability_noaudit() - Tests the task's real credentials

Eww..

> This most of the time has no practical difference but in some cases like
> changing attrs (specifically group id in this case) through a NFS client
> this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> bypassing quota accounting checks.

Yeah, this does look wrong.  Do the other conversion in the above commit
have tthe same issue?

> Using instead ns_capable_noaudit() should fix this issue and prevent
> selinux audit messages.

The generic quota code manages to do without either has_capability_noaudit
or ns_capable_noaudit.  I think this might be hidden behind
inode_owner_or_capable calls.  Any idea why we're different?

> +	bool			force = ns_capable_noaudit(&init_user_ns,
> +							   CAP_FOWNER);

I'd do away with the local variable here.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
  2026-06-24 13:40 ` [PATCH] xfs: fix capabily check in xfs_setattr_nonsize Christoph Hellwig
@ 2026-06-24 18:59   ` Carlos Maiolino
  2026-06-25 10:43   ` Jan Kara
  1 sibling, 0 replies; 8+ messages in thread
From: Carlos Maiolino @ 2026-06-24 18:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, stable, Darrick J. Wong, Dave Chinner, Eric Sandeen,
	Dr. Thomas Orgis, Jan Kara, linux-fsdevel, Christian Brauner

On Wed, Jun 24, 2026 at 03:40:39PM +0200, Christoph Hellwig wrote:
> Adding Jan and Christian for quota and user_ns knowledge.
> 
> On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > From: Carlos Maiolino <cem@kernel.org>
> > 
> > An user reported a bug where he managed to evade group's quota
> > by changing a file's gid to a different group id the same user
> > belonged to, even though quotas were enforced on both gids and the
> > file's size was big enough to exceed the quota's hardlimit.
> > 
> > Commit eba0549bc7d1 replaced a capable() call by a
> > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > Turns out that both calls have slightly different semantics even though
> > their documentation seems similar. Where in a nutshell:
> > 
> > capable() - Tests the task's effective credentials
> > has_ns_capability_noaudit() - Tests the task's real credentials
> 
> Eww..
> 
> > This most of the time has no practical difference but in some cases like
> > changing attrs (specifically group id in this case) through a NFS client
> > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > bypassing quota accounting checks.
> 
> Yeah, this does look wrong.  Do the other conversion in the above commit
> have tthe same issue?

I don't know yet, but I do hope to be able to convert everything back to
a capable_noaudit() helper. I need some time to look at the other
conversions, but giving the different credentials likely the are all
victim of the same issue.

> 
> > Using instead ns_capable_noaudit() should fix this issue and prevent
> > selinux audit messages.
> 
> The generic quota code manages to do without either has_capability_noaudit
> or ns_capable_noaudit.  I think this might be hidden behind
> inode_owner_or_capable calls.  Any idea why we're different?

Yes, I looked into that to check the feasibility to use the generic
code. This specific path goes through __dquot_transfer() which uses
ignore_hardlimit() helper to decide between bypassing quota enforcement
or not.
The later though uses capable(CAP_SYS_RESOURCE) which would trigger
selinux audit messages too at least for our case.

This though made me wonder if we shouldn't be using CAP_SYS_RESOURCE in
lieu of CAP_FOWNER in this path which seems more appropriate to bypass
quota enforcement. Either way though, I think we should at least have an
initial simple to backport patch to older releases.


> 
> > +	bool			force = ns_capable_noaudit(&init_user_ns,
> > +							   CAP_FOWNER);
> 
> I'd do away with the local variable here.

Sounds find to me. I wanted to avoid the function call with two arguments
as an argument. But I honestly have no strong preference.

Cheers!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
  2026-06-24 13:40 ` [PATCH] xfs: fix capabily check in xfs_setattr_nonsize Christoph Hellwig
  2026-06-24 18:59   ` Carlos Maiolino
@ 2026-06-25 10:43   ` Jan Kara
  2026-06-25 11:01     ` Carlos Maiolino
  2026-06-25 12:37     ` Christoph Hellwig
  1 sibling, 2 replies; 8+ messages in thread
From: Jan Kara @ 2026-06-25 10:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: cem, linux-xfs, stable, Darrick J. Wong, Dave Chinner,
	Eric Sandeen, Dr. Thomas Orgis, Jan Kara, linux-fsdevel,
	Christian Brauner

On Wed 24-06-26 15:40:39, Christoph Hellwig wrote:
> Adding Jan and Christian for quota and user_ns knowledge.
> 
> On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > From: Carlos Maiolino <cem@kernel.org>
> > 
> > An user reported a bug where he managed to evade group's quota
> > by changing a file's gid to a different group id the same user
> > belonged to, even though quotas were enforced on both gids and the
> > file's size was big enough to exceed the quota's hardlimit.
> > 
> > Commit eba0549bc7d1 replaced a capable() call by a
> > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > Turns out that both calls have slightly different semantics even though
> > their documentation seems similar. Where in a nutshell:
> > 
> > capable() - Tests the task's effective credentials
> > has_ns_capability_noaudit() - Tests the task's real credentials
> 
> Eww..

Yeah, that's a catch.

> > This most of the time has no practical difference but in some cases like
> > changing attrs (specifically group id in this case) through a NFS client
> > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > bypassing quota accounting checks.
> 
> Yeah, this does look wrong.  Do the other conversion in the above commit
> have tthe same issue?
> 
> > Using instead ns_capable_noaudit() should fix this issue and prevent
> > selinux audit messages.
> 
> The generic quota code manages to do without either has_capability_noaudit
> or ns_capable_noaudit.  I think this might be hidden behind
> inode_owner_or_capable calls.  Any idea why we're different?

Actually no. Generic quota code has equivalent checks in ignore_hardlimit()
function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why
nobody complained about generic quota code is that we call
ignore_hardlimit() only if we are above hardlimit whereas XFS calls this
for every transaction...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
  2026-06-25 10:43   ` Jan Kara
@ 2026-06-25 11:01     ` Carlos Maiolino
  2026-06-25 12:37     ` Christoph Hellwig
  1 sibling, 0 replies; 8+ messages in thread
From: Carlos Maiolino @ 2026-06-25 11:01 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-xfs, stable, Darrick J. Wong,
	Dave Chinner, Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel,
	Christian Brauner

On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote:
> On Wed 24-06-26 15:40:39, Christoph Hellwig wrote:
> > Adding Jan and Christian for quota and user_ns knowledge.
> > 
> > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > > From: Carlos Maiolino <cem@kernel.org>
> > > 
> > > An user reported a bug where he managed to evade group's quota
> > > by changing a file's gid to a different group id the same user
> > > belonged to, even though quotas were enforced on both gids and the
> > > file's size was big enough to exceed the quota's hardlimit.
> > > 
> > > Commit eba0549bc7d1 replaced a capable() call by a
> > > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > > Turns out that both calls have slightly different semantics even though
> > > their documentation seems similar. Where in a nutshell:
> > > 
> > > capable() - Tests the task's effective credentials
> > > has_ns_capability_noaudit() - Tests the task's real credentials
> > 
> > Eww..
> 
> Yeah, that's a catch.
> 
> > > This most of the time has no practical difference but in some cases like
> > > changing attrs (specifically group id in this case) through a NFS client
> > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > > bypassing quota accounting checks.
> > 
> > Yeah, this does look wrong.  Do the other conversion in the above commit
> > have tthe same issue?
> > 
> > > Using instead ns_capable_noaudit() should fix this issue and prevent
> > > selinux audit messages.
> > 
> > The generic quota code manages to do without either has_capability_noaudit
> > or ns_capable_noaudit.  I think this might be hidden behind
> > inode_owner_or_capable calls.  Any idea why we're different?
> 
> Actually no. Generic quota code has equivalent checks in ignore_hardlimit()
> function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why
> nobody complained about generic quota code is that we call
> ignore_hardlimit() only if we are above hardlimit whereas XFS calls this
> for every transaction...

But the capable() call works fine, it's the replacement by
has_capability_noaudit() that enables quota bypass I *think* generic
quota code is safe from this point.

> 
> 								Honza
> 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
  2026-06-25 10:43   ` Jan Kara
  2026-06-25 11:01     ` Carlos Maiolino
@ 2026-06-25 12:37     ` Christoph Hellwig
  2026-06-25 15:00       ` Carlos Maiolino
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2026-06-25 12:37 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, cem, linux-xfs, stable, Darrick J. Wong,
	Dave Chinner, Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel,
	Christian Brauner

On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote:
> On Wed 24-06-26 15:40:39, Christoph Hellwig wrote:
> > Adding Jan and Christian for quota and user_ns knowledge.
> > 
> > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > > From: Carlos Maiolino <cem@kernel.org>
> > > 
> > > An user reported a bug where he managed to evade group's quota
> > > by changing a file's gid to a different group id the same user
> > > belonged to, even though quotas were enforced on both gids and the
> > > file's size was big enough to exceed the quota's hardlimit.
> > > 
> > > Commit eba0549bc7d1 replaced a capable() call by a
> > > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > > Turns out that both calls have slightly different semantics even though
> > > their documentation seems similar. Where in a nutshell:
> > > 
> > > capable() - Tests the task's effective credentials
> > > has_ns_capability_noaudit() - Tests the task's real credentials
> > 
> > Eww..
> 
> Yeah, that's a catch.
> 
> > > This most of the time has no practical difference but in some cases like
> > > changing attrs (specifically group id in this case) through a NFS client
> > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > > bypassing quota accounting checks.
> > 
> > Yeah, this does look wrong.  Do the other conversion in the above commit
> > have tthe same issue?
> > 
> > > Using instead ns_capable_noaudit() should fix this issue and prevent
> > > selinux audit messages.
> > 
> > The generic quota code manages to do without either has_capability_noaudit
> > or ns_capable_noaudit.  I think this might be hidden behind
> > inode_owner_or_capable calls.  Any idea why we're different?
> 
> Actually no. Generic quota code has equivalent checks in ignore_hardlimit()
> function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why
> nobody complained about generic quota code is that we call
> ignore_hardlimit() only if we are above hardlimit whereas XFS calls this
> for every transaction...

I guess we should aim for the same to avoid the spurious audit logs.

I.e. xfs_trans_alloc_ichange is currently always called either with
force = true or force = this capable check.  So as a first step we can
move the check into xfs_trans_alloc_ichange for the !force case, and the
propagate that through XFS_QMOPT_FORCE_RES into xfs_trans_dqresv, i.e.
only set XFS_QMOPT_FORCE_RES for the real forced case and instead
have the capable check down in xfs_trans_dqresv.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
  2026-06-25 12:37     ` Christoph Hellwig
@ 2026-06-25 15:00       ` Carlos Maiolino
  2026-06-25 16:03         ` Darrick J. Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Carlos Maiolino @ 2026-06-25 15:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-xfs, stable, Darrick J. Wong, Dave Chinner,
	Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel, Christian Brauner

On Thu, Jun 25, 2026 at 02:37:54PM +0200, Christoph Hellwig wrote:
> On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote:
> > On Wed 24-06-26 15:40:39, Christoph Hellwig wrote:
> > > Adding Jan and Christian for quota and user_ns knowledge.
> > > 
> > > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > > > From: Carlos Maiolino <cem@kernel.org>
> > > > 
> > > > An user reported a bug where he managed to evade group's quota
> > > > by changing a file's gid to a different group id the same user
> > > > belonged to, even though quotas were enforced on both gids and the
> > > > file's size was big enough to exceed the quota's hardlimit.
> > > > 
> > > > Commit eba0549bc7d1 replaced a capable() call by a
> > > > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > > > Turns out that both calls have slightly different semantics even though
> > > > their documentation seems similar. Where in a nutshell:
> > > > 
> > > > capable() - Tests the task's effective credentials
> > > > has_ns_capability_noaudit() - Tests the task's real credentials
> > > 
> > > Eww..
> > 
> > Yeah, that's a catch.
> > 
> > > > This most of the time has no practical difference but in some cases like
> > > > changing attrs (specifically group id in this case) through a NFS client
> > > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > > > bypassing quota accounting checks.
> > > 
> > > Yeah, this does look wrong.  Do the other conversion in the above commit
> > > have tthe same issue?
> > > 
> > > > Using instead ns_capable_noaudit() should fix this issue and prevent
> > > > selinux audit messages.
> > > 
> > > The generic quota code manages to do without either has_capability_noaudit
> > > or ns_capable_noaudit.  I think this might be hidden behind
> > > inode_owner_or_capable calls.  Any idea why we're different?
> > 
> > Actually no. Generic quota code has equivalent checks in ignore_hardlimit()
> > function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why
> > nobody complained about generic quota code is that we call
> > ignore_hardlimit() only if we are above hardlimit whereas XFS calls this
> > for every transaction...
> 
> I guess we should aim for the same to avoid the spurious audit logs.
> 
> I.e. xfs_trans_alloc_ichange is currently always called either with
> force = true or force = this capable check.  So as a first step we can
> move the check into xfs_trans_alloc_ichange for the !force case, and the
> propagate that through XFS_QMOPT_FORCE_RES into xfs_trans_dqresv, i.e.
> only set XFS_QMOPT_FORCE_RES for the real forced case and instead
> have the capable check down in xfs_trans_dqresv.
> 

Sounds fair, I'll give it a try tomorrow.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
  2026-06-25 15:00       ` Carlos Maiolino
@ 2026-06-25 16:03         ` Darrick J. Wong
  2026-06-26  7:34           ` Carlos Maiolino
  0 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2026-06-25 16:03 UTC (permalink / raw)
  To: Carlos Maiolino
  Cc: Christoph Hellwig, Jan Kara, linux-xfs, stable, Dave Chinner,
	Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel, Christian Brauner

On Thu, Jun 25, 2026 at 05:00:21PM +0200, Carlos Maiolino wrote:
> On Thu, Jun 25, 2026 at 02:37:54PM +0200, Christoph Hellwig wrote:
> > On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote:
> > > On Wed 24-06-26 15:40:39, Christoph Hellwig wrote:
> > > > Adding Jan and Christian for quota and user_ns knowledge.
> > > > 
> > > > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > > > > From: Carlos Maiolino <cem@kernel.org>
> > > > > 
> > > > > An user reported a bug where he managed to evade group's quota
> > > > > by changing a file's gid to a different group id the same user
> > > > > belonged to, even though quotas were enforced on both gids and the
> > > > > file's size was big enough to exceed the quota's hardlimit.
> > > > > 
> > > > > Commit eba0549bc7d1 replaced a capable() call by a
> > > > > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > > > > Turns out that both calls have slightly different semantics even though
> > > > > their documentation seems similar. Where in a nutshell:
> > > > > 
> > > > > capable() - Tests the task's effective credentials
> > > > > has_ns_capability_noaudit() - Tests the task's real credentials
> > > > 
> > > > Eww..
> > > 
> > > Yeah, that's a catch.

I spent a while trying to figure out how I went wrong in selecting the
function name:

 * capable - Determine if the current task has a superior capability in effect
 * @cap: The capability to be tested for
 *
 * Return true if the current task has the given superior capability currently
 * available for use, false if not.

vs.

 * has_capability_noaudit - Does a task have a capability (unaudited) in the
 * initial user ns
 * @t: The task in question
 * @cap: The capability to be tested for
 *
 * Return true if the specified task has the given superior capability
 * currently in effect to init_user_ns, false if not.  Don't write an
 * audit message for the check.

vs.

 * ns_capable_noaudit - Determine if the current task has a superior capability
 * (unaudited) in effect
 * @ns:  The usernamespace we want the capability in
 * @cap: The capability to be tested for
 *
 * Return true if the current task has the given superior capability currently
 * available for use, false if not.

All three of these sound the same to me.  But what about the call path?

capable -> ns_capable -> ns_capable_common

has_capability_noaudit -> has_ns_capability_noaudit

ns_capable_noaudit -> ns_capable_common

Hum, maybe that's the difference -- ns_capable_common vs.
has_ns_capability_noaudit?

ns_capable_common calls security_capable() with current_cred(), whereas
has_ns_capability_noaudit calls it with __task_cred(t), which is
@current in the xfs_trans_alloc_ichange case.

Aha!  current_cred is current->cred, whereas __task_cred(current) is
current->real_cred, and these aren't the same thing:

	/* Objective and real subjective task credentials (COW): */
	const struct cred __rcu		*real_cred;

	/* Effective (overridable) subjective task credentials (COW): */
	const struct cred __rcu		*cred;

So I guess ns_capable_common (and wrappers) are testing the process'
effective credentials, whereas has_ns_capability_noaudit is testing the
process' real credentials?

It would have been *really* nice if the documentation for those
functions mentioned that distinction!

> > > > > This most of the time has no practical difference but in some cases like
> > > > > changing attrs (specifically group id in this case) through a NFS client
> > > > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > > > > bypassing quota accounting checks.
> > > > 
> > > > Yeah, this does look wrong.  Do the other conversion in the above commit
> > > > have tthe same issue?
> > > > 
> > > > > Using instead ns_capable_noaudit() should fix this issue and prevent
> > > > > selinux audit messages.
> > > > 
> > > > The generic quota code manages to do without either has_capability_noaudit
> > > > or ns_capable_noaudit.  I think this might be hidden behind
> > > > inode_owner_or_capable calls.  Any idea why we're different?
> > > 
> > > Actually no. Generic quota code has equivalent checks in ignore_hardlimit()
> > > function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why
> > > nobody complained about generic quota code is that we call
> > > ignore_hardlimit() only if we are above hardlimit whereas XFS calls this
> > > for every transaction...
> > 
> > I guess we should aim for the same to avoid the spurious audit logs.
> > 
> > I.e. xfs_trans_alloc_ichange is currently always called either with
> > force = true or force = this capable check.  So as a first step we can
> > move the check into xfs_trans_alloc_ichange for the !force case, and the
> > propagate that through XFS_QMOPT_FORCE_RES into xfs_trans_dqresv, i.e.
> > only set XFS_QMOPT_FORCE_RES for the real forced case and instead
> > have the capable check down in xfs_trans_dqresv.
> > 
> 
> Sounds fair, I'll give it a try tomorrow.

So yeah, I agree we should change that to:

	ns_capable_noaudit(&init_user_ns, CAP_FOWNER)

Though it's also weird that XFS gates it on CAP_FOWNER whereas the VFS
checks CAP_SYS_RESOURCE.  Though I would have added this:

static inline bool
current_may_ignore_quota_limits(void)
{
	/*
	 * If the current process' effective credentials include
	 * CAP_FOWNER, then they're allowed to ignore the hard limit.
	 */
	return ns_capable_noaudit(&init_user_ns, CAP_FOWNER);
}

and then changed the callsites in xfs_ioctl/xfs_iops.c to:

	error = xfs_trans_alloc_ichange(ip, udqp, gdqp, NULL,
			current_may_ignore_quota_limits(), &tp);

The has_capability_noaudit call in xfs_fsmap.c should change to
ns_capable_noaudit.

--D

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
  2026-06-25 16:03         ` Darrick J. Wong
@ 2026-06-26  7:34           ` Carlos Maiolino
  0 siblings, 0 replies; 8+ messages in thread
From: Carlos Maiolino @ 2026-06-26  7:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Jan Kara, linux-xfs, stable, Dave Chinner,
	Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel, Christian Brauner

On Thu, Jun 25, 2026 at 09:03:17AM -0700, Darrick J. Wong wrote:
> On Thu, Jun 25, 2026 at 05:00:21PM +0200, Carlos Maiolino wrote:
> > On Thu, Jun 25, 2026 at 02:37:54PM +0200, Christoph Hellwig wrote:
> > > On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote:
> > > > On Wed 24-06-26 15:40:39, Christoph Hellwig wrote:
> > > > > Adding Jan and Christian for quota and user_ns knowledge.
> > > > > 
> > > > > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > > > > > From: Carlos Maiolino <cem@kernel.org>
> > > > > > 
> > > > > > An user reported a bug where he managed to evade group's quota
> > > > > > by changing a file's gid to a different group id the same user
> > > > > > belonged to, even though quotas were enforced on both gids and the
> > > > > > file's size was big enough to exceed the quota's hardlimit.
> > > > > > 
> > > > > > Commit eba0549bc7d1 replaced a capable() call by a
> > > > > > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > > > > > Turns out that both calls have slightly different semantics even though
> > > > > > their documentation seems similar. Where in a nutshell:
> > > > > > 
> > > > > > capable() - Tests the task's effective credentials
> > > > > > has_ns_capability_noaudit() - Tests the task's real credentials
> > > > > 
> > > > > Eww..
> > > > 
> > > > Yeah, that's a catch.
> 
> I spent a while trying to figure out how I went wrong in selecting the
> function name:
> 
>  * capable - Determine if the current task has a superior capability in effect
>  * @cap: The capability to be tested for
>  *
>  * Return true if the current task has the given superior capability currently
>  * available for use, false if not.
> 
> vs.
> 
>  * has_capability_noaudit - Does a task have a capability (unaudited) in the
>  * initial user ns
>  * @t: The task in question
>  * @cap: The capability to be tested for
>  *
>  * Return true if the specified task has the given superior capability
>  * currently in effect to init_user_ns, false if not.  Don't write an
>  * audit message for the check.
> 
> vs.
> 
>  * ns_capable_noaudit - Determine if the current task has a superior capability
>  * (unaudited) in effect
>  * @ns:  The usernamespace we want the capability in
>  * @cap: The capability to be tested for
>  *
>  * Return true if the current task has the given superior capability currently
>  * available for use, false if not.
> 
> All three of these sound the same to me.  But what about the call path?

Yup, I reached the same conclusion, they are all the same, so I decided
to look at the implementation...

> 
> capable -> ns_capable -> ns_capable_common
> 
> has_capability_noaudit -> has_ns_capability_noaudit
> 
> ns_capable_noaudit -> ns_capable_common
> 
> Hum, maybe that's the difference -- ns_capable_common vs.
> has_ns_capability_noaudit?
> 
> ns_capable_common calls security_capable() with current_cred(), whereas
> has_ns_capability_noaudit calls it with __task_cred(t), which is
> @current in the xfs_trans_alloc_ichange case.
> 
> Aha!  current_cred is current->cred, whereas __task_cred(current) is
> current->real_cred, and these aren't the same thing:
> 
> 	/* Objective and real subjective task credentials (COW): */
> 	const struct cred __rcu		*real_cred;
> 
> 	/* Effective (overridable) subjective task credentials (COW): */
> 	const struct cred __rcu		*cred;
> 
> So I guess ns_capable_common (and wrappers) are testing the process'
> effective credentials, whereas has_ns_capability_noaudit is testing the
> process' real credentials?
> 
> It would have been *really* nice if the documentation for those
> functions mentioned that distinction!

Yup, this is a much more detailed versions from my same findings :)


> 
> > > > > > This most of the time has no practical difference but in some cases like
> > > > > > changing attrs (specifically group id in this case) through a NFS client
> > > > > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > > > > > bypassing quota accounting checks.
> > > > > 
> > > > > Yeah, this does look wrong.  Do the other conversion in the above commit
> > > > > have tthe same issue?
> > > > > 
> > > > > > Using instead ns_capable_noaudit() should fix this issue and prevent
> > > > > > selinux audit messages.
> > > > > 
> > > > > The generic quota code manages to do without either has_capability_noaudit
> > > > > or ns_capable_noaudit.  I think this might be hidden behind
> > > > > inode_owner_or_capable calls.  Any idea why we're different?
> > > > 
> > > > Actually no. Generic quota code has equivalent checks in ignore_hardlimit()
> > > > function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why
> > > > nobody complained about generic quota code is that we call
> > > > ignore_hardlimit() only if we are above hardlimit whereas XFS calls this
> > > > for every transaction...
> > > 
> > > I guess we should aim for the same to avoid the spurious audit logs.
> > > 
> > > I.e. xfs_trans_alloc_ichange is currently always called either with
> > > force = true or force = this capable check.  So as a first step we can
> > > move the check into xfs_trans_alloc_ichange for the !force case, and the
> > > propagate that through XFS_QMOPT_FORCE_RES into xfs_trans_dqresv, i.e.
> > > only set XFS_QMOPT_FORCE_RES for the real forced case and instead
> > > have the capable check down in xfs_trans_dqresv.
> > > 
> > 
> > Sounds fair, I'll give it a try tomorrow.
> 
> So yeah, I agree we should change that to:
> 
> 	ns_capable_noaudit(&init_user_ns, CAP_FOWNER)
> 
> Though it's also weird that XFS gates it on CAP_FOWNER whereas the VFS
> checks CAP_SYS_RESOURCE.  Though I would have added this:

Same to me, I think there is no real reason for us to be using
CAP_FOWNER? I thought about changing it to CAP_SYS_RESOURCE but honestly
I'd avoid doing that for LTS and just stick with CAP_FOWNER initially.

> 
> static inline bool
> current_may_ignore_quota_limits(void)
> {
> 	/*
> 	 * If the current process' effective credentials include
> 	 * CAP_FOWNER, then they're allowed to ignore the hard limit.
> 	 */
> 	return ns_capable_noaudit(&init_user_ns, CAP_FOWNER);
> }
> 
> and then changed the callsites in xfs_ioctl/xfs_iops.c to:
> 
> 	error = xfs_trans_alloc_ichange(ip, udqp, gdqp, NULL,
> 			current_may_ignore_quota_limits(), &tp);
> 
> The has_capability_noaudit call in xfs_fsmap.c should change to
> ns_capable_noaudit.

I'm working on a version now adding some bits hch suggested. At the end
I think we should add a new helper in kernel/capabililty, but I want to
avoid doing that now so LTS can cleanly backport a fix. Then we can
update it.
I'm working on both things in parallel and see how it goes. I'll Cc you
folks.

> 
> --D
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-26  7:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260624101436.362533-1-cem@kernel.org>
2026-06-24 13:40 ` [PATCH] xfs: fix capabily check in xfs_setattr_nonsize Christoph Hellwig
2026-06-24 18:59   ` Carlos Maiolino
2026-06-25 10:43   ` Jan Kara
2026-06-25 11:01     ` Carlos Maiolino
2026-06-25 12:37     ` Christoph Hellwig
2026-06-25 15:00       ` Carlos Maiolino
2026-06-25 16:03         ` Darrick J. Wong
2026-06-26  7:34           ` Carlos Maiolino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox