Linux filesystem development
 help / color / mirror / Atom feed
* Re: [PATCH v3 5/5] block: validate user space vectors during extraction
From: Christoph Hellwig @ 2026-06-25 11:57 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-block, linux-fsdevel, dm-devel, hch, axboe, brauner, djwong,
	viro, Keith Busch, stable
In-Reply-To: <20260624170905.3972095-6-kbusch@meta.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply

* Re: [PATCH v3 4/5] zloop: set dma_alignment from the backing files for direct I/O
From: Christoph Hellwig @ 2026-06-25 11:57 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-block, linux-fsdevel, dm-devel, hch, axboe, brauner, djwong,
	viro, Keith Busch
In-Reply-To: <20260624170905.3972095-5-kbusch@meta.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply

* Re: [PATCH v3 3/5] loop: set dma_alignment from the backing file for direct I/O
From: Christoph Hellwig @ 2026-06-25 11:57 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-block, linux-fsdevel, dm-devel, hch, axboe, brauner, djwong,
	viro, Keith Busch
In-Reply-To: <20260624170905.3972095-4-kbusch@meta.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply

* Re: [PATCH v3 1/5] block: use blkdev_iov_iter_get_pages status for errors
From: Christoph Hellwig @ 2026-06-25 11:56 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-block, linux-fsdevel, dm-devel, hch, axboe, brauner, djwong,
	viro, Keith Busch
In-Reply-To: <20260624170905.3972095-2-kbusch@meta.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply

* Re: [PATCH v2 1/2] iomap: add helper to mark folio uptodate
From: Christoph Hellwig @ 2026-06-25 11:50 UTC (permalink / raw)
  To: Joanne Koong
  Cc: brauner, willy, hch, djwong, miklos, linux-fsdevel, fuse-devel
In-Reply-To: <20260624212925.1668662-2-joannelkoong@gmail.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply

* Re: [PATCH v4] coredump: Add /proc/<pid>/coredump_pre_exit for pre-exit before dumping
From: David Hildenbrand (Arm) @ 2026-06-25 11:43 UTC (permalink / raw)
  To: Pedro Falcato
  Cc: Christian Brauner, Mike Rapoport, Lorenzo Stoakes, mjguzik,
	ebiederm, viro, jack, jlayton, chuck.lever, alex.aring, arnd,
	keescook, mcgrof, j.granados, allen.lkml, linux-fsdevel,
	linux-kernel, linux-arch, Xin Zhao, linux-mm
In-Reply-To: <aj0Mr3e9yt0kU-Qj@pedro-suse>

On 6/25/26 13:18, Pedro Falcato wrote:
> On Thu, Jun 25, 2026 at 12:57:02PM +0200, David Hildenbrand (Arm) wrote:
>>>
>>> This makes no sense. I think you really need to sit down and think about
>>> a design for this that doesn't introduce state machinery for boot, mm,
>>> and the VFS in one shot to solve a fringe problem...
>>
>> Staring at exit_mmap_mapped_shared(), ... this looks rather hacky ("let's fake
>> munmap and set some magical flags").
>>
>> We're essentially saying "we don't want (pretty much) anything that's MAP_SHARED
>> in the coredump". And for some reason someone should configure that, that's a
>> rather weird toggle tbh.
>>
>> And the granularity ("file-backed shared memory") is completely odd.
>>
>>
>> Aren't there other ways we could optimize this internally?
>>
>> Like, if we know that a process is dead and cannot run anymore, downgrade writes
>> to reads (and make sure we block GUP write attempts accordingly), or would that
>> also not be sufficient?
>>
>>
>> Another thought:
>>
>> fs/coredump.c calls get_dump_page().
>>
>> get_dump_page() will not fault in any memory. So if a page is not in the page
>> tables at the time of the dump, it will not get included in the coredump. Which
>> means, that whether most non-anonymous memory will be included in a coredump is
>> already like playing the lottery.
>>
>> This is true for MAP_SHARED file mappings and MAP_PRIVATE file mappings without
>> private modifications.
>>
>> Which makes me wonder: How much is tooling relying on file-backed pages to end
>> up in a coredump?
> 
> FWIW this mechanism already exists, see /proc/self/coredump_filter. The
> default is bits 0, 1, 4 and 5 (see core(5)), which maps back to no file pages
> being dumped to a core dump, apart from ELF headers (these help the debugger
> trace back the mapped binary to the debug info using the buildid).
> 
> So the answer to this question is "approximately none" :)
> 

Ah, thanks! vma_dump_size() honors this, and I am sure through some magical
routing the information stored in m->dump_size will end up not dumping these pages.

Staring at elf_core_dump(), this "unmap some stuff" part is really, really
nasty, as it effectively removes the VMAs->segments from the dump. (unless I am
missing something important)

-- 
Cheers,

David

^ permalink raw reply

* Re: [stable/linux-6.12.y 0/2] Backport Fix incorrect overlayfs mmap() and mprotect() LSM access controls
From: Greg KH @ 2026-06-25 11:31 UTC (permalink / raw)
  To: Cai Xinchen
  Cc: viro, brauner, jack, miklos, amir73il, paul, jmorris, serge,
	stephen.smalley.work, omosnace, bboscaccy, linux-fsdevel,
	linux-kernel, linux-unionfs, linux-security-module, selinux, bpf,
	lujialin4
In-Reply-To: <20260622031509.2663919-1-caixinchen1@huawei.com>

On Mon, Jun 22, 2026 at 11:15:07AM +0800, Cai Xinchen wrote:
> Backport the patch series
> "Fix incorrect overlayfs mmap() and mprotect() LSM access controls" [1]
> to 6.12 lts
> 
> I test selinux-testsuite[2] overlay test, it pass 135 tests.
> 
> [1] https://lore.kernel.org/all/20260403030848.731867-5-paul@paul-moore.com/
> [2] https://github.com/SELinuxProject/selinux-testsuite

Again, upstream git ids are needed.  Please redo all of these and
resend.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] fs/proc: fix KPF_KSM reported for all anonymous pages
From: Jinjiang Tu @ 2026-06-25 11:21 UTC (permalink / raw)
  To: David Hildenbrand (Arm), akpm, ziy, luizcap, willy, linmiaohe,
	svetly.todorov, xu.xin16, chengming.zhou, linux-fsdevel, linux-mm
  Cc: wangkefeng.wang, sunnanyong
In-Reply-To: <6e39171c-8333-463d-9ea6-db331965d2c4@kernel.org>


在 2026/6/25 14:39, David Hildenbrand (Arm) 写道:
> On 6/23/26 03:37, Jinjiang Tu wrote:
>> 在 2026/6/22 19:45, David Hildenbrand (Arm) 写道:
>>> On 6/22/26 11:15, Jinjiang Tu wrote:
>>>> Reading /proc/kpageflags for any anonymous page returns KPF_KSM set, even
>>>> when KSM is not in use. As a result, tools misclassify all anonymous pages
>>>> as KSM merged.
>>>>
>>>> In stable_page_flags(), if the page is anonymous, then use (mapping &
>>>> FOLIO_MAPPING_KSM) check to identify if the anonymous page is KSM page.
>>>> However, FOLIO_MAPPING_KSM is FOLIO_MAPPING_ANON | FOLIO_MAPPING_ANON_KSM,
>>>> (mapping & FOLIO_MAPPING_KSM) check returns true for all nonymous pages.
>>>>
>>>> To fix it, use FOLIO_MAPPING_ANON_KSM instead.
>>>>
>>>> Fixes: dee3d0bef2b0 ("proc: rewrite stable_page_flags()")
>>> Right,
>>>
>>> #define PAGE_MAPPING_KSM       (PAGE_MAPPING_ANON | PAGE_MAPPING_ANON_KSM)
>>>
>>> Which we later renamed to FOLIO_MAPPING_KSM.
>>>
>>>
>>> Before switching to manual flag checks, PageKsm() translated to folio_test_ksm()
>>> that checked whether the values actually matched:
>>>
>>> ((unsigned long)folio->mapping & PAGE_MAPPING_FLAGS) == PAGE_MAPPING_KSM;
>>>
>>>
>>> This only affects /proc/kpageflags (well, and hwpoison inject on a weird testing
>>> interface), so it's not really that relevant for real workloads (debugging and
>>> testing).
>>>
>>> So not sure whether we should CC:stable. Likely not.
>> /proc/kpageflags is generally used only for analysis and is unlikely to be
>> used in production environments. I found this issue due to I was analyzing
>> pfns allocated by which stacks are KSM-merged. So I think it's unnecessary
>> to CC:stable.
>>
>>>> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
>>>> ---
>>>>    fs/proc/page.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/proc/page.c b/fs/proc/page.c
>>>> index f9b2c2c906cd..cef8ded97610 100644
>>>> --- a/fs/proc/page.c
>>>> +++ b/fs/proc/page.c
>>>> @@ -173,7 +173,7 @@ u64 stable_page_flags(const struct page *page)
>>>>            u |= 1 << KPF_MMAP;
>>>>        if (is_anon) {
>>>>            u |= 1 << KPF_ANON;
>>>> -        if (mapping & FOLIO_MAPPING_KSM)
>>>> +        if (mapping & FOLIO_MAPPING_ANON_KSM)
>>> Wonder whether we should just do
>>>
>>>          if ((mapping & FOLIO_MAPPING_FLAGS) == FOLIO_MAPPING_KSM)
>>>
>>> To match what we have in folio_test_ksm.
>>>
>>> (although I doubt we would reuse this flag for other purposes, likely
>>> it's more future proof to check it like that)
>> Both are ok. The following check has checked FOLIO_MAPPING_ANON,
>>
>>      if (is_anon) {
>>          if (mapping & FOLIO_MAPPING_ANON_KSM)
>>      }
>>
>> So it's equivalent to do
>>      if ((mapping & FOLIO_MAPPING_FLAGS) == FOLIO_MAPPING_KSM)
>> or
>>      if (mapping & FOLIO_MAPPING_ANON_KSM)
> As I said, matching precisely what we have in folio_test_ksm() is clearer. We
> don't have any users of FOLIO_MAPPING_ANON_KSM outside of page-flags.h for a
> reason :)

Indeed, I will send v2.

>

^ permalink raw reply

* Re: [PATCH v4] coredump: Add /proc/<pid>/coredump_pre_exit for pre-exit before dumping
From: Pedro Falcato @ 2026-06-25 11:18 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Christian Brauner, Mike Rapoport, Lorenzo Stoakes, mjguzik,
	ebiederm, viro, jack, jlayton, chuck.lever, alex.aring, arnd,
	keescook, mcgrof, j.granados, allen.lkml, linux-fsdevel,
	linux-kernel, linux-arch, Xin Zhao, linux-mm
In-Reply-To: <9105c433-44a7-4e8f-bacb-def93d11a7f2@kernel.org>

On Thu, Jun 25, 2026 at 12:57:02PM +0200, David Hildenbrand (Arm) wrote:
> >> +
> >>  #define F_DUPFD		0	/* dup */
> >>  #define F_GETFD		1	/* get close_on_exec */
> >>  #define F_SETFD		2	/* set/clear close_on_exec */
> >> diff --git a/kernel/fork.c b/kernel/fork.c
> >> index a679b2448234..84f1ee7f32cf 100644
> >> --- a/kernel/fork.c
> >> +++ b/kernel/fork.c
> >> @@ -1030,6 +1030,18 @@ static int __init coredump_filter_setup(char *s)
> >>  
> >>  __setup("coredump_filter=", coredump_filter_setup);
> >>  
> >> +static unsigned long default_dump_pre_exit;
> >> +
> >> +static int __init coredump_pre_exit_setup(char *s)
> >> +{
> >> +	default_dump_pre_exit =
> >> +		(simple_strtoul(s, NULL, 0) << MMF_DUMP_PRE_EXIT_SHIFT) &
> >> +		MMF_DUMP_PRE_EXIT_MASK;
> >> +	return 1;
> >> +}
> >> +
> >> +__setup("coredump_pre_exit=", coredump_pre_exit_setup);
> > 
> > This makes no sense. I think you really need to sit down and think about
> > a design for this that doesn't introduce state machinery for boot, mm,
> > and the VFS in one shot to solve a fringe problem...
> 
> Staring at exit_mmap_mapped_shared(), ... this looks rather hacky ("let's fake
> munmap and set some magical flags").
> 
> We're essentially saying "we don't want (pretty much) anything that's MAP_SHARED
> in the coredump". And for some reason someone should configure that, that's a
> rather weird toggle tbh.
> 
> And the granularity ("file-backed shared memory") is completely odd.
> 
> 
> Aren't there other ways we could optimize this internally?
> 
> Like, if we know that a process is dead and cannot run anymore, downgrade writes
> to reads (and make sure we block GUP write attempts accordingly), or would that
> also not be sufficient?
> 
> 
> Another thought:
> 
> fs/coredump.c calls get_dump_page().
> 
> get_dump_page() will not fault in any memory. So if a page is not in the page
> tables at the time of the dump, it will not get included in the coredump. Which
> means, that whether most non-anonymous memory will be included in a coredump is
> already like playing the lottery.
> 
> This is true for MAP_SHARED file mappings and MAP_PRIVATE file mappings without
> private modifications.
> 
> Which makes me wonder: How much is tooling relying on file-backed pages to end
> up in a coredump?

FWIW this mechanism already exists, see /proc/self/coredump_filter. The
default is bits 0, 1, 4 and 5 (see core(5)), which maps back to no file pages
being dumped to a core dump, apart from ELF headers (these help the debugger
trace back the mapped binary to the debug info using the buildid).

So the answer to this question is "approximately none" :)

-- 
Pedro

^ permalink raw reply

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
From: Zhang Yi @ 2026-06-25 11:16 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung), Christoph Hellwig
  Cc: Pankaj Raghav, linux-xfs, bfoster, lukas, Darrick J . Wong, dgc,
	gost.dev, andres, kundan.kumar, cem, linux-fsdevel, linux-api
In-Reply-To: <557b2e5c-7c65-48de-87a9-6fba21eca99f@huaweicloud.com>

On 6/18/2026 11:22 AM, Zhang Yi wrote:
> On 6/17/2026 5:44 PM, Pankaj Raghav (Samsung) wrote:
>> On Tue, Jun 16, 2026 at 06:31:40AM -0700, Christoph Hellwig wrote:
>>> [API questions for Zhang and -fsdevel/ -api below)
>>>
>>>> +	unsigned int		blksize = i_blocksize(inode);
>>>> +	loff_t			offset_aligned = round_down(offset, blksize);
>>>
>>> I think this actually needs to found up instead of rounding down.
>>>
>>>> +	/*
>>>> +	 * Zero the tail of the old EOF block and any space up to the new
>>>> +	 * offset.
>>>> +	 * In the usual truncate path, xfs_falloc_setsize takes care of
>>>> +	 * zeroing those blocks.
>>>> +	 */
>>>> +	if (offset_aligned > old_size) {
>>>> +		trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size);
>>>> +		error = xfs_zero_range(ip, old_size, offset_aligned - old_size,
>>>> +				NULL, &did_zero);
>>>> +		if (error)
>>>> +			return error;
>>>> +	}
>>>
>>> ... then this will properly zero from the old i_size to the first block
>>> boundary after the old size.
>>
>> Hmm, right now we do this:
>>
>> |----------|----------|----------|
>>     ^      ^     ^    ^
>>     |      |     |    |
>>  old_size  |   offset |
>>            |          |
>> 	off_rd       off_ru
>>
>> At the moment, we zero out old_size to off_rd and pass offset to
>> xfs_alloc_file_space. xfs_alloc_file_space rounds down the offset to off_rd.
>>
>> What you are proposing is to zero out old_size to off_ru, and pass
>> off_ru to xfs_alloc_file_space. I don't exactly understand the
>> difference.
> 
> IMO, FALLOC_FL_WRITE_ZEROES should handle the unaligned cases, if the
> 'offset' and 'end' are not block-size aligned, then:
> 
> 1) if the two blocks straddling the boundaries have not yet been allocated,
>    or allocated as unwritten, we should round outward the allocation range
>    and zero out all allocated blocks, including those two boundary blocks.
> 2) if the blocks at the boundaries are already in the written state — which
>    can occur when we call FALLOC_FL_WRITE_ZEROES within the file size. We
>    should be careful here: we should only zero the ranges [offset, offset_ru)
>    and [end_rd, end) for the boundary blocks, leaving the already-written
>    portions of the boundary blocks intact.
> 
I've thought about this more and feel that we need to add one more
scenario here:

3) If the blocks at the boundaries are in dirty unwritten or in delay
   allocated state, it should be handled the same way as scenario 2.
   Additionally, before returning, we need to flush these two boundary
   blocks that have been partially zeroed, to ensure that after
   FALLOC_FL_WRITE_ZEROES, all ranges are in the written state.

What do you think?

Best Regards,
Yi.
















^ permalink raw reply

* Re: update BDI {io,ra}_pages values based on the RT device limits
From: Jan Kara @ 2026-06-25 11:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Carlos Maiolino, Filip Blagojevic, Matthew Wilcox,
	Damien Le Moal, linux-fsdevel, linux-xfs
In-Reply-To: <20260624134950.GA6471@lst.de>

On Wed 24-06-26 15:49:50, Christoph Hellwig wrote:
> On Wed, Jun 24, 2026 at 02:26:35PM +0200, Jan Kara wrote:
> > A cleaner but still relatively simple way how to fix your problem would be
> > to do something similar like e.g. btrfs does. That is when you have a
> > filesystem with multiple backing devices, you create a new BDI (using
> > super_setup_bdi()) and set its parameters based on the combination of
> > devices you have.
> 
> The big downside of that is that it breaks the userspace ABI - any
> previous changes to BDI paramters, most importantly the read-ahead size,
> would now only apply to the BDI for s_bdev which isn't actually used
> by anything.

Hum, right. XFS users are used to tweak
/sys/block/<dev>/queue/read_ahead_kb
which would stop influencing anything. I guess that could indeed break some
people's expectations.

> > >   1) support multiple BDIs per file system.
> > > 
> > > 	This would work pretty well for XFS, as data for a given file is
> > > 	always entirely on one device, and the VFS writeback code is not
> > > 	used for metadata.  But it probably doesn't work well for other
> > > 	cases
> > 
> > Since looking up the bdi is abstracted through inode_to_bdi() (which
> > currently goes through the superblock), this would be relatively easy to do
> > but either the inode would have to store the bdi pointer or we'd have to
> > have an operation to query it. Just at this point I'm not fully convinced
> > the additional complexity / memory cost is worth it compared to what can be
> > achieved with a synthetic bdi.
> 
> Same.

One more idea: We already cache bdi's ra_pages in struct file_ra_state on
file open. It wouldn't be that hard to allow filesystem to override the
values in its open method (currently it isn't possible only because
file_ra_state_init() is called after ->open). For io_pages we currently
always go to the bdi so we'd have to add it to struct file_ra_state.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [stable/linux-6.18.y 2/2] selinux: fix overlayfs mmap() and mprotect() access checks
From: Greg KH @ 2026-06-25 11:06 UTC (permalink / raw)
  To: Cai Xinchen
  Cc: viro, brauner, jack, miklos, amir73il, paul, jmorris, serge,
	stephen.smalley.work, omosnace, bboscaccy, linux-fsdevel,
	linux-kernel, linux-unionfs, linux-security-module, selinux, bpf,
	lujialin4
In-Reply-To: <20260622031416.2663747-3-caixinchen1@huawei.com>

On Mon, Jun 22, 2026 at 11:14:16AM +0800, Cai Xinchen wrote:
> From: Paul Moore <paul@paul-moore.com>
> 
> The existing SELinux security model for overlayfs is to allow access if
> the current task is able to access the top level file (the "user" file)
> and the mounter's credentials are sufficient to access the lower
> level file (the "backing" file).  Unfortunately, the current code does
> not properly enforce these access controls for both mmap() and mprotect()
> operations on overlayfs filesystems.
> 
> This patch makes use of the newly created security_mmap_backing_file()
> LSM hook to provide the missing backing file enforcement for mmap()
> operations, and leverages the backing file API and new LSM blob to
> provide the necessary information to properly enforce the mprotect()
> access controls.
> 
> Cc: stable@vger.kernel.org
> Acked-by: Amir Goldstein <amir73il@gmail.com>
> Signed-off-by: Paul Moore <paul@paul-moore.com>
> Signed-off-by: Cai Xinchen <caixinchen1@huawei.com>
> ---
>  security/selinux/hooks.c          | 242 ++++++++++++++++++++++--------
>  security/selinux/include/objsec.h |  11 ++
>  2 files changed, 189 insertions(+), 64 deletions(-)

Again, what is the git id?

thanks,

greg k-h

^ permalink raw reply

* Re: [stable/linux-6.18.y 1/2] lsm: add backing_file LSM hooks
From: Greg KH @ 2026-06-25 11:06 UTC (permalink / raw)
  To: Cai Xinchen
  Cc: viro, brauner, jack, miklos, amir73il, paul, jmorris, serge,
	stephen.smalley.work, omosnace, bboscaccy, linux-fsdevel,
	linux-kernel, linux-unionfs, linux-security-module, selinux, bpf,
	lujialin4
In-Reply-To: <20260622031416.2663747-2-caixinchen1@huawei.com>

On Mon, Jun 22, 2026 at 11:14:15AM +0800, Cai Xinchen wrote:
> From: Paul Moore <paul@paul-moore.com>
> 
> Mainline declares lsm_backing_file_cache in security/lsm.h.  Linux 6.18.y
> does not have security/lsm_init.c or security/lsm.h; the cache variable
> is defined locally as static struct kmem_cache *lsm_backing_file_cache in
> security/security.c.
> 
> Original commit message:

What is the original git commit id?

ANd put the "changes" down in the signed-off-by area, like other
backports normally do please.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
From: Carlos Maiolino @ 2026-06-25 11:01 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-xfs, stable, Darrick J. Wong,
	Dave Chinner, Eric Sandeen, Dr. Thomas Orgis, linux-fsdevel,
	Christian Brauner
In-Reply-To: <qcsdbdpp23fsu3cqhpjdpwusvl6onc2knnrun522ofrutxpz6j@reh3k2ofqjir>

On Thu, Jun 25, 2026 at 12:43:54PM +0200, Jan Kara wrote:
> On Wed 24-06-26 15:40:39, Christoph Hellwig wrote:
> > Adding Jan and Christian for quota and user_ns knowledge.
> > 
> > On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > > From: Carlos Maiolino <cem@kernel.org>
> > > 
> > > An user reported a bug where he managed to evade group's quota
> > > by changing a file's gid to a different group id the same user
> > > belonged to, even though quotas were enforced on both gids and the
> > > file's size was big enough to exceed the quota's hardlimit.
> > > 
> > > Commit eba0549bc7d1 replaced a capable() call by a
> > > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > > Turns out that both calls have slightly different semantics even though
> > > their documentation seems similar. Where in a nutshell:
> > > 
> > > capable() - Tests the task's effective credentials
> > > has_ns_capability_noaudit() - Tests the task's real credentials
> > 
> > Eww..
> 
> Yeah, that's a catch.
> 
> > > This most of the time has no practical difference but in some cases like
> > > changing attrs (specifically group id in this case) through a NFS client
> > > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > > bypassing quota accounting checks.
> > 
> > Yeah, this does look wrong.  Do the other conversion in the above commit
> > have tthe same issue?
> > 
> > > Using instead ns_capable_noaudit() should fix this issue and prevent
> > > selinux audit messages.
> > 
> > The generic quota code manages to do without either has_capability_noaudit
> > or ns_capable_noaudit.  I think this might be hidden behind
> > inode_owner_or_capable calls.  Any idea why we're different?
> 
> Actually no. Generic quota code has equivalent checks in ignore_hardlimit()
> function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why
> nobody complained about generic quota code is that we call
> ignore_hardlimit() only if we are above hardlimit whereas XFS calls this
> for every transaction...

But the capable() call works fine, it's the replacement by
has_capability_noaudit() that enables quota bypass I *think* generic
quota code is safe from this point.

> 
> 								Honza
> 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH v3 1/7] list: Add mutable iterator variants
From: Jani Nikula @ 2026-06-25 11:00 UTC (permalink / raw)
  To: Kaitao Cheng, David Laight, Christian König,
	David Hildenbrand (Arm), Alexei Starovoitov
  Cc: Andrew Morton, David Hildenbrand, Jens Axboe, Tejun Heo,
	Alexander Viro, Christian Brauner, Daniel Borkmann,
	Andrii Nakryiko, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
	Juri Lelli, Vincent Guittot, Paul Moore, Andy Shevchenko,
	Paul E. McKenney, Shakeel Butt, David Howells, Simona Vetter,
	Randy Dunlap, Luca Ceresoli, Philipp Stanner, linux-block,
	linux-kernel, cgroups, linux-ntfs-dev, linux-fsdevel, io-uring,
	audit, bpf, netdev, dri-devel, linux-perf-users,
	linux-trace-kernel, kexec, live-patching, linux-modules,
	linux-crypto, linux-pm, rcu, sched-ext, linux-mm, virtualization,
	damon, llvm, Kaitao Cheng, Muchun Song
In-Reply-To: <0ed6b5c3-e955-46e2-9fc6-075a0dfd1c4f@linux.dev>

On Thu, 25 Jun 2026, Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
> 在 2026/6/24 22:23, David Laight 写道:
>> On Wed, 24 Jun 2026 15:23:47 +0200
>> Christian König <christian.koenig@amd.com> wrote:
>>> On 6/24/26 15:14, Kaitao Cheng wrote:
>>>> 在 2026/6/22 16:42, David Laight 写道:  
>>>>> On Mon, 22 Jun 2026 12:05:31 +0800
>>>>> Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>>>>>  
>>>>>> From: Kaitao Cheng <chengkaitao@kylinos.cn>
>>>>>>
>>>>>> The list_for_each*_safe() helpers are used when the loop body may
>>>>>> remove the current entry.  Their API exposes the temporary cursor at
>>>>>> every call site, even though most users only need it for the iterator
>>>>>> implementation and never reference it in the loop body.
>>>>>>
>>>>>> Add *_mutable() variants for list and hlist iteration.  The new helpers
>>>>>> support both forms: callers may keep passing an explicit temporary cursor
>>>>>> when they need to inspect or reset it, or omit it and let the helper use
>>>>>> a unique internal cursor.  
>>>>>
>>>>> I'm not really sure 'mutable' means anything either.
>>>>> It is possible to make it valid for the loop body (or even other threads)
>>>>> to delete arbitrary list items - but that needs significant extra overheads.
>>>>>
>>>>> It might be worth doing something that doesn't need the extra variable,
>>>>> but there is little point doing all the churn just to rename things.
>>>>>  
>>>>>>
>>>>>> This makes call sites that only mutate the list through the current entry
>>>>>> less noisy, while keeping the existing *_safe() helpers available for
>>>>>> compatibility.
>>>>>>
>>>>>> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
>>>>>> ---
>>>>>>  include/linux/list.h | 269 +++++++++++++++++++++++++++++++++++++------
>>>>>>  1 file changed, 231 insertions(+), 38 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/list.h b/include/linux/list.h
>>>>>> index 09d979976b3b..1081def7cea9 100644
>>>>>> --- a/include/linux/list.h
>>>>>> +++ b/include/linux/list.h
>>>>>> @@ -7,6 +7,7 @@
>>>>>>  #include <linux/stddef.h>
>>>>>>  #include <linux/poison.h>
>>>>>>  #include <linux/const.h>
>>>>>> +#include <linux/args.h>
>>>>>>  
>>>>>>  #include <asm/barrier.h>
>>>>>>  
>>>>>> @@ -763,28 +764,72 @@ static inline void list_splice_tail_init(struct list_head *list,
>>>>>>  #define list_for_each_prev(pos, head) \
>>>>>>  	for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
>>>>>>  
>>>>>> -/**
>>>>>> - * list_for_each_safe - iterate over a list safe against removal of list entry
>>>>>> - * @pos:	the &struct list_head to use as a loop cursor.
>>>>>> - * @n:		another &struct list_head to use as temporary storage
>>>>>> - * @head:	the head for your list.
>>>>>> +/*
>>>>>> + * list_for_each_safe is an old interface, use list_for_each_mutable instead.
>>>>>>   */
>>>>>>  #define list_for_each_safe(pos, n, head) \
>>>>>>  	for (pos = (head)->next, n = pos->next; \
>>>>>>  	     !list_is_head(pos, (head)); \
>>>>>>  	     pos = n, n = pos->next)
>>>>>>  
>>>>>> +#define __list_for_each_mutable_internal(pos, tmp, head)		\
>>>>>> +	for (typeof(pos) tmp = (pos = (head)->next)->next;		\  
>>>>>
>>>>> Use auto
>>>>>  
>>>>>> +	     !list_is_head(pos, (head));				\
>>>>>> +	     pos = tmp, tmp = pos->next)
>>>>>> +
>>>>>> +#define __list_for_each_mutable1(pos, head)				\
>>>>>> +	__list_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
>>>>>> +
>>>>>> +#define __list_for_each_mutable2(pos, next, head)			\
>>>>>> +	list_for_each_safe(pos, next, head)
>>>>>> +
>>>>>>  /**
>>>>>> - * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
>>>>>> + * list_for_each_mutable - iterate over a list safe against entry removal
>>>>>>   * @pos:	the &struct list_head to use as a loop cursor.
>>>>>> - * @n:		another &struct list_head to use as temporary storage
>>>>>> - * @head:	the head for your list.
>>>>>> + * @...:	either (head) or (next, head)
>>>>>> + *
>>>>>> + * next:	another &struct list_head to use as optional temporary storage.
>>>>>> + *		The temporary cursor is internal unless explicitly supplied by
>>>>>> + *		the caller.
>>>>>> + * head:	the head for your list.
>>>>>> + */
>>>>>> +#define list_for_each_mutable(pos, ...)					\
>>>>>> +	CONCATENATE(__list_for_each_mutable, COUNT_ARGS(__VA_ARGS__))	\
>>>>>> +		(pos, __VA_ARGS__)  
>>>>>
>>>>> The variable argument count logic really just slows down compilation.
>>>>> Maybe there aren't enough copies of this code to make that significant.
>>>>> But just because you can do it doesn't mean it is a gooD idea.
>>>>> I'm also not sure it really adds anything to the readability.
>>>>>
>>>>> And, it you are going to make the middle argument optional there is
>>>>> no need to change the macro name.  
>>>>
>>>> Christian König and Jani Nikula also disagree with the variadic-argument
>>>> implementation approach. If we abandon that method, it means we will
>>>> inevitably need to add some new macros. If mutable is not a good name,
>>>> suggestions for better alternatives would be welcome; coming up with a
>>>> suitable name is indeed rather tricky.  
>>>
>>> I don't think you need to add a new macro for the specific use case that people want to modify the next element of the iteration.
>>>
>>> If I remember your numbers correctly that is a really corner case and keeping using the existing *_safe() macros for that sounds perfectly fine to me.
>> 
>> IIRC currently you have a choice of either:
>> 	define               Item that can't be deleted
>> 	list_for_each()	     The current item.
>> 	list_for_each_safe() The next item.
>> There is also likely to be code that updates the variables to allow
>> for other scenarios.
>> 
>> Note that if increase a reference count and release a lock then list_for_each()
>> is likely safer than list_for_each_safe() :-)
>> 
>> list.h has 9 variants of the 'safe' loop.
>> The bloat of another 9 is getting excessive.
>> 
>> It has to be said that this is one of my least favourite type of list...
>
> Hi Christian König, David Laight, Jani Nikula, David Hildenbrand,
> Andy Shevchenko, Alexei Starovoitov
>
> For ease of discussion, I need to summarize the currently possible
> approaches and briefly describe their respective pros and cons,
> using the list_for_each_entry* interfaces as examples.
>
> 1. Add list_for_each_entry_mutable, while keeping list_for_each_entry
> and list_for_each_entry_safe unchanged. list_for_each_entry_mutable
> would be used specifically for safe deletion scenarios that do not
> need to expose the temporary cursor externally. The code can refer to
> the v1 version.
>
> Pros: Does not depend on immediate per-subsystem adaptation and can be
>       merged directly.
> Cons: Requires adding a whole set of mutable interfaces, which makes the
>       code somewhat redundant.

Seems fine, and the original _safe naming is ambiguous anyway.

> 2. Directly optimize away the temporary cursor in list_for_each_entry_safe
> and define it inside the loop instead, changing the interface from four
> arguments to three.
>
> Pros: Does not add redundant interfaces.
> Cons: (1) Users need to manually update special cases that use the
>       traversal variable of list_for_each_entry_safe, the new
>       list_for_each_entry_safe would no longer apply there and would
>       need to be open-coded.
>       (2) Because the macro arguments changes, all list_for_each_entry_safe
>       callers would need to be modified and merged together, making it
>       difficult to merge such a large amount of code at once.

This won't fly because there are literally thousands of
list_for_each_entry_safe() users.

> 3. Use a variadic macro approach to optimize list_for_each_entry_safe,
> so that it supports both three and four arguments.
>
> Pros: (1) Does not add redundant interfaces.
>       (2) Does not depend on immediate per-subsystem adaptation and can
>       be merged directly.
> Cons: (1) Increases compile time.
>       (2) Makes the interface harder for users to use.

Basically I'm against any variadic macro tricks where the optional
argument is not the last argument. That's just way too surprising, and
goes against common practice in just about all other languages.

> 4. Optimize list_for_each_entry by defining the temporary cursor internally,
> making it compatible with the functionality of list_for_each_entry_safe.
> The code can refer to the v2 version.
>
> Pros: (1) Does not add redundant interfaces.
>       (2) The number of externally visible arguments of list_for_each_entry
>       remains unchanged, still three.
> Cons: (1) list_for_each_entry and list_for_each_entry_safe would be merged
>       into one, and list_for_each_entry_safe would gradually be deprecated.
>       (2) Users need to manually update special cases that use the traversal
>       variable of list_for_each_entry, the new list_for_each_entry would no
>       longer apply there and would need to be open-coded. There are 15 such
>       cases in total.

This sounds good to me, though I take it there's some code size increase
and/or performance penalty?

Maybe the 15 cases are questionable anyway?

> 5. Use a variadic macro approach to optimize list_for_each_entry, so that
> it supports both three and four arguments.
>
> Pros: (1) Does not add redundant interfaces.
>       (2) Does not depend on immediate per-subsystem adaptation and can be
>       merged directly.
> Cons: (1) Increases compile time.
>       (2) list_for_each_entry and list_for_each_entry_safe would be merged
>       into one, and list_for_each_entry_safe would gradually be deprecated.

Please don't do the macro tricks.

> 6. Make no changes, keep the current logic unchanged, and close the current
> email discussion.

I like hiding the temporary stuff when possible.


BR,
Jani.

-- 
Jani Nikula, Intel

^ permalink raw reply

* Re: [PATCH v4] coredump: Add /proc/<pid>/coredump_pre_exit for pre-exit before dumping
From: David Hildenbrand (Arm) @ 2026-06-25 10:57 UTC (permalink / raw)
  To: Christian Brauner, Mike Rapoport, Lorenzo Stoakes, mjguzik,
	pfalcato, ebiederm, viro, jack, jlayton, chuck.lever, alex.aring,
	arnd, keescook, mcgrof, j.granados, allen.lkml
  Cc: linux-fsdevel, linux-kernel, linux-arch, Xin Zhao, linux-mm
In-Reply-To: <20260625-wappnen-drohbrief-wermutstropfen-c53538f01547@brauner>

>> +
>>  #define F_DUPFD		0	/* dup */
>>  #define F_GETFD		1	/* get close_on_exec */
>>  #define F_SETFD		2	/* set/clear close_on_exec */
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index a679b2448234..84f1ee7f32cf 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -1030,6 +1030,18 @@ static int __init coredump_filter_setup(char *s)
>>  
>>  __setup("coredump_filter=", coredump_filter_setup);
>>  
>> +static unsigned long default_dump_pre_exit;
>> +
>> +static int __init coredump_pre_exit_setup(char *s)
>> +{
>> +	default_dump_pre_exit =
>> +		(simple_strtoul(s, NULL, 0) << MMF_DUMP_PRE_EXIT_SHIFT) &
>> +		MMF_DUMP_PRE_EXIT_MASK;
>> +	return 1;
>> +}
>> +
>> +__setup("coredump_pre_exit=", coredump_pre_exit_setup);
> 
> This makes no sense. I think you really need to sit down and think about
> a design for this that doesn't introduce state machinery for boot, mm,
> and the VFS in one shot to solve a fringe problem...

Staring at exit_mmap_mapped_shared(), ... this looks rather hacky ("let's fake
munmap and set some magical flags").

We're essentially saying "we don't want (pretty much) anything that's MAP_SHARED
in the coredump". And for some reason someone should configure that, that's a
rather weird toggle tbh.

And the granularity ("file-backed shared memory") is completely odd.


Aren't there other ways we could optimize this internally?

Like, if we know that a process is dead and cannot run anymore, downgrade writes
to reads (and make sure we block GUP write attempts accordingly), or would that
also not be sufficient?


Another thought:

fs/coredump.c calls get_dump_page().

get_dump_page() will not fault in any memory. So if a page is not in the page
tables at the time of the dump, it will not get included in the coredump. Which
means, that whether most non-anonymous memory will be included in a coredump is
already like playing the lottery.

This is true for MAP_SHARED file mappings and MAP_PRIVATE file mappings without
private modifications.

Which makes me wonder: How much is tooling relying on file-backed pages to end
up in a coredump?

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH] xfs: fix capabily check in xfs_setattr_nonsize
From: Jan Kara @ 2026-06-25 10:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: cem, linux-xfs, stable, Darrick J. Wong, Dave Chinner,
	Eric Sandeen, Dr. Thomas Orgis, Jan Kara, linux-fsdevel,
	Christian Brauner
In-Reply-To: <20260624134039.GB5649@lst.de>

On Wed 24-06-26 15:40:39, Christoph Hellwig wrote:
> Adding Jan and Christian for quota and user_ns knowledge.
> 
> On Wed, Jun 24, 2026 at 12:14:29PM +0200, cem@kernel.org wrote:
> > From: Carlos Maiolino <cem@kernel.org>
> > 
> > An user reported a bug where he managed to evade group's quota
> > by changing a file's gid to a different group id the same user
> > belonged to, even though quotas were enforced on both gids and the
> > file's size was big enough to exceed the quota's hardlimit.
> > 
> > Commit eba0549bc7d1 replaced a capable() call by a
> > has_capability_noaudit() to prevent unnecessary selinux audit messages.
> > Turns out that both calls have slightly different semantics even though
> > their documentation seems similar. Where in a nutshell:
> > 
> > capable() - Tests the task's effective credentials
> > has_ns_capability_noaudit() - Tests the task's real credentials
> 
> Eww..

Yeah, that's a catch.

> > This most of the time has no practical difference but in some cases like
> > changing attrs (specifically group id in this case) through a NFS client
> > this will allow the quota code to use XFS_QMOPT_FORCE_RES, effectively
> > bypassing quota accounting checks.
> 
> Yeah, this does look wrong.  Do the other conversion in the above commit
> have tthe same issue?
> 
> > Using instead ns_capable_noaudit() should fix this issue and prevent
> > selinux audit messages.
> 
> The generic quota code manages to do without either has_capability_noaudit
> or ns_capable_noaudit.  I think this might be hidden behind
> inode_owner_or_capable calls.  Any idea why we're different?

Actually no. Generic quota code has equivalent checks in ignore_hardlimit()
function which does capable(CAP_SYS_RESOURCE) check. I guess the reason why
nobody complained about generic quota code is that we call
ignore_hardlimit() only if we are above hardlimit whereas XFS calls this
for every transaction...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH net v3 1/2] iov_iter: export iov_iter_restore
From: Christian Brauner @ 2026-06-25 10:43 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: netdev, Alexander Viro, Andrew Morton, Arseniy Krasnov,
	David S. Miller, Eric Dumazet, Eugenio Pérez, Jakub Kicinski,
	Jason Wang, kvm, linux-block, linux-fsdevel, linux-kernel,
	Michael S. Tsirkin, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
	Stefano Garzarella, virtualization, Xuan Zhuo, Jens Axboe
In-Reply-To: <20260622222757.2130402-2-tavip@google.com>

> Export iov_iter_restore so that it can be used by modules.
> 
> This is needed by the virtio vsock transport (which can be built as a
> module) to restore the msg_iter state when transmission fails.
> 
> Acked-by: Stefano Garzarella <sgarzare@redhat.com>
> Signed-off-by: Octavian Purdila <tavip@google.com>
>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 273919b16161..f5df63961fb2 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1491,6 +1491,7 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
>  		i->__iov -= state->nr_segs - i->nr_segs;
>  	i->nr_segs = state->nr_segs;
>  }
> +EXPORT_SYMBOL_GPL(iov_iter_restore);

At least only export it for the module that really needs it. For
example, see:

EXPORT_SYMBOL_FOR_MODULES(__kernel_write, "autofs4");

-- 
Christian Brauner <brauner@kernel.org>

^ permalink raw reply

* Re: [PATCH 1/3] fs, mm: add ->cachestat() file operation
From: Christian Brauner @ 2026-06-25 10:36 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Pavel Tikhomirov, Johannes Weiner, Miklos Szeredi, Alexander Viro,
	Christian Brauner, Jan Kara, Matthew Wilcox (Oracle),
	Andrew Morton, Nhat Pham, Shuah Khan, linux-unionfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest
In-Reply-To: <CAOQ4uxhras5YAr8_HBpuHSDwvfiEF8vAr0NDB0R_PXtGDhnqjQ@mail.gmail.com>

On 2026-06-23 17:34:47+02:00, Amir Goldstein wrote:
> On Tue, Jun 23, 2026 at 4:55 PM Pavel Tikhomirov
> <ptikhomirov@virtuozzo.com> wrote:
> 
> > On 6/23/26 15:48, Johannes Weiner wrote:
> >
> > Yes, AFAIU in overlay when we use realfile we should always use it
> > with_ovl_creds(), even though I don't think there is anything cred related
> > in filemap_cachestat(), I still think we should follow the common pattern
> > other overlay helpers use (similar to ovl_fadvise() and ovl_flush()).
> >
> > note: Actually some places get ovl_real_file() and use it without
> > with_ovl_creds(), e.g.: ovl_read_iter, ovl_write_iter, ovl_splice_read,
> > ovl_splice_write. But those look more of an exception than the general
> > rule. All other instances use with_ovl_creds().
> 
> Use with_ovl_creds() is a good practice to keep the mental security model,
> but it is useless if the security check (can_do_cachestat) is not in the
> vfs helper (vfs_cachestat), so please move it there.
> 
> Also it kind of makes more sense to check (flags != 0) in sys_cachestats
> before checking permissions.
> 
> > Also there are simingly no other file_operations which return "realfile"
> > for further processing, mostly the operation from fsops simply replaces
> > general operation with its own logic completely.
> >
> > Thanks for your review!
> >
> > ps: Hope overlay maintainers will correctly if I'm getting this wrong.
> 
> I don't think this is wrong per-se, except for can_do_cachestat().
> 
> Just be aware that the real file could change from one cachestat
> call to the next (i.e. due to copy up).

I'm really grump about adding a new file operation just for a
special-sauce system call which is under a CONFIG_* option even. We're
not going to set the precedent of piling on custom file operations for a
single filesystem everytime someone adds a new system call unless
absolutely necessary. This looks like it could just use a new helper in
fs/backing_file.c that the cachestat thing can call to use the correct
file.


^ permalink raw reply

* Re: [PATCH v2 0/7] vmsplice: fix some problems in my previous vmsplice patchset
From: David Hildenbrand (Arm) @ 2026-06-25 10:35 UTC (permalink / raw)
  To: Askar Safin
  Cc: akpm, avagin, axboe, brauner, collin.funk1, david.laight.linux,
	dhowells, fuse-devel, hch, jack, joannelkoong, kernel, linux-api,
	linux-fsdevel, linux-kernel, linux-mm, luto, metze, miklos,
	netdev, patches, pfalcato, torvalds, val, viro, w, willy
In-Reply-To: <20260625101132.3859505-1-safinaskar@gmail.com>

On 6/25/26 12:11, Askar Safin wrote:
> "David Hildenbrand (Arm)" <david@kernel.org>:
>> I think we concluded that we cannot rip out vmsplice that way at this point, and
>> I suspect that Christian will drop that topic branch from -next after -rc1.
> 
> I think my patches still have a chance.

I talked to Christian and it doesn't sound like it.

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH] exec: fix off-by-one in binfmt max rewrite depth comment
From: Christian Brauner @ 2026-06-25 10:27 UTC (permalink / raw)
  To: Kees Cook, Alan Urmancheev; +Cc: linux-mm, linux-fsdevel, linux-kernel, trivial
In-Reply-To: <20260623052322.74711-1-alan.urman@gmail.com>

On Tue, 23 Jun 2026 01:23:22 -0400, Alan Urmancheev wrote:
> exec: fix off-by-one in binfmt max rewrite depth comment

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/1] exec: fix off-by-one in binfmt max rewrite depth comment
      https://git.kernel.org/vfs/vfs/c/3146aada8527


^ permalink raw reply

* [PATCH v18 5/8] rust: rename `AlwaysRefCounted` to `RefCounted`.
From: Andreas Hindborg @ 2026-06-25 10:15 UTC (permalink / raw)
  To: Danilo Krummrich, Lorenzo Stoakes, Vlastimil Babka,
	Liam R. Howlett, Uladzislau Rezki, Miguel Ojeda, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Alice Ryhl,
	Trevor Gross, Daniel Almeida, Tamir Duberstein, Alexandre Courbot,
	Onur Özkan, Lyude Paul, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Rafael J. Wysocki, Dave Ertman, Ira Weiny,
	Leon Romanovsky, Paul Moore, Serge Hallyn, David Airlie,
	Simona Vetter, Alexander Viro, Jan Kara, Igor Korotin,
	Viresh Kumar, Nishanth Menon, Stephen Boyd, Bjorn Helgaas,
	Krzysztof Wilczyński, Pavel Tikhomirov, Michal Wilczynski
  Cc: Andreas Hindborg, Philipp Stanner, rust-for-linux, linux-kernel,
	linux-mm, driver-core, linux-block, linux-security-module,
	dri-devel, linux-fsdevel, linux-pm, linux-pci, linux-pwm,
	Oliver Mangold, Viresh Kumar, Igor Korotin
In-Reply-To: <20260625-unique-ref-v18-0-4e06b5896d47@kernel.org>

From: Oliver Mangold <oliver.mangold@pm.me>

There are types where it may both be reference counted in some cases and
owned in others. In such cases, obtaining `ARef<T>` from `&T` would be
unsound as it allows creation of `ARef<T>` copy from `&Owned<T>`.

Therefore, we split `AlwaysRefCounted` into `RefCounted` (which `ARef<T>`
would require) and a marker trait to indicate that the type is always
reference counted (and not `Ownable`) so the `&T` -> `ARef<T>` conversion
is possible.

- Rename `AlwaysRefCounted` to `RefCounted`.
- Add a new unsafe trait `AlwaysRefCounted`.
- Implement the new trait `AlwaysRefCounted` for the newly renamed
  `RefCounted` implementations. This leaves functionality of existing
  implementers of `AlwaysRefCounted` intact.

Suggested-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Oliver Mangold <oliver.mangold@pm.me>
[ Andreas: Updated commit message and rebase on rust-next (7.2) ]
Acked-by: Igor Korotin <igor.korotin.linux@gmail.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Gary Guo <gary@garyguo.net>
Co-developed-by: Andreas Hindborg <a.hindborg@kernel.org>
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 rust/kernel/auxiliary.rs        | 10 +++++++-
 rust/kernel/block/mq/request.rs | 19 +++++++++-----
 rust/kernel/cred.rs             | 16 ++++++++++--
 rust/kernel/device.rs           | 12 +++++++--
 rust/kernel/device/property.rs  | 11 ++++++--
 rust/kernel/drm/device.rs       |  9 +++++--
 rust/kernel/drm/gem/mod.rs      | 16 +++++++++---
 rust/kernel/fs/file.rs          | 23 ++++++++++++++---
 rust/kernel/i2c.rs              | 13 +++++++---
 rust/kernel/mm.rs               | 22 +++++++++++++---
 rust/kernel/mm/mmput_async.rs   | 12 +++++++--
 rust/kernel/opp.rs              | 16 +++++++++---
 rust/kernel/owned.rs            |  2 +-
 rust/kernel/pci.rs              | 10 +++++++-
 rust/kernel/pid_namespace.rs    | 15 +++++++++--
 rust/kernel/platform.rs         | 10 +++++++-
 rust/kernel/pwm.rs              | 12 +++++++--
 rust/kernel/sync/aref.rs        | 57 +++++++++++++++++++++++++----------------
 rust/kernel/task.rs             | 13 ++++++++--
 rust/kernel/types.rs            | 12 ++++++---
 rust/kernel/usb.rs              | 17 +++++++++---
 21 files changed, 255 insertions(+), 72 deletions(-)

diff --git a/rust/kernel/auxiliary.rs b/rust/kernel/auxiliary.rs
index c42928d5a2393..854525289c8b4 100644
--- a/rust/kernel/auxiliary.rs
+++ b/rust/kernel/auxiliary.rs
@@ -19,6 +19,10 @@
         to_result, //
     },
     prelude::*,
+    sync::aref::{
+        AlwaysRefCounted,
+        RefCounted, //
+    },
     types::{
         ForLt,
         ForeignOwnable,
@@ -344,7 +348,7 @@ unsafe impl<Ctx: device::DeviceContext> device::AsBusDevice<Ctx> for Device<Ctx>
 kernel::impl_device_context_into_aref!(Device);
 
 // SAFETY: Instances of `Device` are always reference-counted.
-unsafe impl crate::sync::aref::AlwaysRefCounted for Device {
+unsafe impl RefCounted for Device {
     fn inc_ref(&self) {
         // SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.
         unsafe { bindings::get_device(self.as_ref().as_raw()) };
@@ -363,6 +367,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from a
+// `&Device`.
+unsafe impl AlwaysRefCounted for Device {}
+
 impl<Ctx: device::DeviceContext> AsRef<device::Device<Ctx>> for Device<Ctx> {
     fn as_ref(&self) -> &device::Device<Ctx> {
         // SAFETY: By the type invariant of `Self`, `self.as_raw()` is a pointer to a valid
diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request.rs
index ce3e30c81cb5e..8dad15ae4cfb0 100644
--- a/rust/kernel/block/mq/request.rs
+++ b/rust/kernel/block/mq/request.rs
@@ -9,7 +9,11 @@
     block::mq::Operations,
     error::Result,
     sync::{
-        aref::{ARef, AlwaysRefCounted},
+        aref::{
+            ARef,
+            AlwaysRefCounted,
+            RefCounted, //
+        },
         atomic::Relaxed,
         Refcount,
     },
@@ -229,11 +233,10 @@ unsafe impl<T: Operations> Send for Request<T> {}
 // mutate `self` are internally synchronized`
 unsafe impl<T: Operations> Sync for Request<T> {}
 
-// SAFETY: All instances of `Request<T>` are reference counted. This
-// implementation of `AlwaysRefCounted` ensure that increments to the ref count
-// keeps the object alive in memory at least until a matching reference count
-// decrement is executed.
-unsafe impl<T: Operations> AlwaysRefCounted for Request<T> {
+// SAFETY: All instances of `Request<T>` are reference counted. This implementation of `RefCounted`
+// ensure that increments to the ref count keeps the object alive in memory at least until a
+// matching reference count decrement is executed.
+unsafe impl<T: Operations> RefCounted for Request<T> {
     fn inc_ref(&self) {
         self.wrapper_ref().refcount().inc();
     }
@@ -255,3 +258,7 @@ unsafe fn dec_ref(obj: core::ptr::NonNull<Self>) {
         }
     }
 }
+
+// SAFETY: We currently do not implement `Ownable`, thus it is okay to obtain an `ARef<Request>`
+// from a `&Request` (but this will change in the future).
+unsafe impl<T: Operations> AlwaysRefCounted for Request<T> {}
diff --git a/rust/kernel/cred.rs b/rust/kernel/cred.rs
index ffa156b9df377..b17736a9adcd5 100644
--- a/rust/kernel/cred.rs
+++ b/rust/kernel/cred.rs
@@ -8,7 +8,15 @@
 //!
 //! Reference: <https://www.kernel.org/doc/html/latest/security/credentials.html>
 
-use crate::{bindings, sync::aref::AlwaysRefCounted, task::Kuid, types::Opaque};
+use crate::{
+    bindings,
+    sync::aref::RefCounted,
+    task::Kuid,
+    types::{
+        AlwaysRefCounted,
+        Opaque, //
+    }, //
+};
 
 /// Wraps the kernel's `struct cred`.
 ///
@@ -76,7 +84,7 @@ pub fn euid(&self) -> Kuid {
 }
 
 // SAFETY: The type invariants guarantee that `Credential` is always ref-counted.
-unsafe impl AlwaysRefCounted for Credential {
+unsafe impl RefCounted for Credential {
     #[inline]
     fn inc_ref(&self) {
         // SAFETY: The existence of a shared reference means that the refcount is nonzero.
@@ -90,3 +98,7 @@ unsafe fn dec_ref(obj: core::ptr::NonNull<Credential>) {
         unsafe { bindings::put_cred(obj.cast().as_ptr()) };
     }
 }
+
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Credential>` from a
+// `&Credential`.
+unsafe impl AlwaysRefCounted for Credential {}
diff --git a/rust/kernel/device.rs b/rust/kernel/device.rs
index 645afc49a27d6..2e90f6a06fd05 100644
--- a/rust/kernel/device.rs
+++ b/rust/kernel/device.rs
@@ -8,8 +8,12 @@
     bindings,
     fmt,
     prelude::*,
-    sync::aref::ARef,
+    sync::aref::{
+        ARef,
+        RefCounted, //
+    },
     types::{
+        AlwaysRefCounted,
         ForeignOwnable,
         Opaque, //
     }, //
@@ -448,7 +452,7 @@ pub fn name(&self) -> &CStr {
 kernel::impl_device_context_into_aref!(Device);
 
 // SAFETY: Instances of `Device` are always reference-counted.
-unsafe impl crate::sync::aref::AlwaysRefCounted for Device {
+unsafe impl RefCounted for Device {
     fn inc_ref(&self) {
         // SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.
         unsafe { bindings::get_device(self.as_raw()) };
@@ -460,6 +464,10 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from a
+// `&Device`.
+unsafe impl AlwaysRefCounted for Device {}
+
 // SAFETY: As by the type invariant `Device` can be sent to any thread.
 unsafe impl Send for Device {}
 
diff --git a/rust/kernel/device/property.rs b/rust/kernel/device/property.rs
index 5aead835fbbc0..cee7e25013689 100644
--- a/rust/kernel/device/property.rs
+++ b/rust/kernel/device/property.rs
@@ -14,7 +14,10 @@
     fmt,
     prelude::*,
     str::{CStr, CString},
-    sync::aref::ARef,
+    sync::aref::{
+        ARef,
+        AlwaysRefCounted, //
+    },
     types::Opaque,
 };
 
@@ -360,7 +363,7 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
 }
 
 // SAFETY: Instances of `FwNode` are always reference-counted.
-unsafe impl crate::sync::aref::AlwaysRefCounted for FwNode {
+unsafe impl crate::sync::aref::RefCounted for FwNode {
     fn inc_ref(&self) {
         // SAFETY: The existence of a shared reference guarantees that the
         // refcount is non-zero.
@@ -374,6 +377,10 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<FwNode>` from a
+// `&FwNode`.
+unsafe impl AlwaysRefCounted for FwNode {}
+
 enum Node<'a> {
     Borrowed(&'a FwNode),
     Owned(ARef<FwNode>),
diff --git a/rust/kernel/drm/device.rs b/rust/kernel/drm/device.rs
index 403fc35353c74..368742a258376 100644
--- a/rust/kernel/drm/device.rs
+++ b/rust/kernel/drm/device.rs
@@ -15,7 +15,8 @@
     prelude::*,
     sync::aref::{
         ARef,
-        AlwaysRefCounted, //
+        AlwaysRefCounted,
+        RefCounted, //
     },
     types::Opaque,
     workqueue::{
@@ -227,7 +228,7 @@ fn deref(&self) -> &Self::Target {
 
 // SAFETY: DRM device objects are always reference counted and the get/put functions
 // satisfy the requirements.
-unsafe impl<T: drm::Driver> AlwaysRefCounted for Device<T> {
+unsafe impl<T: drm::Driver> RefCounted for Device<T> {
     fn inc_ref(&self) {
         // SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.
         unsafe { bindings::drm_dev_get(self.as_raw()) };
@@ -242,6 +243,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from a
+// `&Device`.
+unsafe impl<T: drm::Driver> AlwaysRefCounted for Device<T> {}
+
 impl<T: drm::Driver> AsRef<device::Device> for Device<T> {
     fn as_ref(&self) -> &device::Device {
         // SAFETY: `bindings::drm_device::dev` is valid as long as the DRM device itself is valid,
diff --git a/rust/kernel/drm/gem/mod.rs b/rust/kernel/drm/gem/mod.rs
index 01b5bd47a3332..30d3718578fe8 100644
--- a/rust/kernel/drm/gem/mod.rs
+++ b/rust/kernel/drm/gem/mod.rs
@@ -17,7 +17,7 @@
     prelude::*,
     sync::aref::{
         ARef,
-        AlwaysRefCounted, //
+        RefCounted, //
     },
     types::Opaque,
 };
@@ -29,7 +29,7 @@
 #[cfg(CONFIG_RUST_DRM_GEM_SHMEM_HELPER)]
 pub mod shmem;
 
-/// A macro for implementing [`AlwaysRefCounted`] for any GEM object type.
+/// A macro for implementing [`RefCounted`] for any GEM object type.
 ///
 /// Since all GEM objects use the same refcounting scheme.
 #[macro_export]
@@ -42,7 +42,7 @@ impl $( <$( $tparam_id:ident ),+> )? for $type:ty
         )?
     ) => {
         // SAFETY: All GEM objects are refcounted.
-        unsafe impl $( <$( $tparam_id ),+> )? $crate::sync::aref::AlwaysRefCounted for $type
+        unsafe impl $( <$( $tparam_id ),+> )? $crate::sync::aref::RefCounted for $type
         where
             Self: IntoGEMObject,
             $( $( $bind_param : $bind_trait ),+ )?
@@ -61,6 +61,14 @@ unsafe fn dec_ref(obj: core::ptr::NonNull<Self>) {
                 unsafe { bindings::drm_gem_object_put(obj) };
             }
         }
+
+        // SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<$type>` from a
+        // `&$type`.
+        unsafe impl $( <$( $tparam_id ),+> )? $crate::sync::aref::AlwaysRefCounted for $type
+        where
+            Self: IntoGEMObject,
+            $( $( $bind_param : $bind_trait ),+ )?
+        {}
     };
 }
 #[cfg_attr(not(CONFIG_RUST_DRM_GEM_SHMEM_HELPER), allow(unused))]
@@ -98,7 +106,7 @@ fn close(_obj: &<Self::Driver as drm::Driver>::Object, _file: &DriverFile<Self>)
 }
 
 /// Trait that represents a GEM object subtype
-pub trait IntoGEMObject: Sized + super::private::Sealed + AlwaysRefCounted {
+pub trait IntoGEMObject: Sized + super::private::Sealed + RefCounted {
     /// Returns a reference to the raw `drm_gem_object` structure, which must be valid as long as
     /// this owning object is valid.
     fn as_raw(&self) -> *mut bindings::drm_gem_object;
diff --git a/rust/kernel/fs/file.rs b/rust/kernel/fs/file.rs
index 23ee689bd2400..720e57418358d 100644
--- a/rust/kernel/fs/file.rs
+++ b/rust/kernel/fs/file.rs
@@ -12,8 +12,15 @@
     cred::Credential,
     error::{code::*, to_result, Error, Result},
     fmt,
-    sync::aref::{ARef, AlwaysRefCounted},
-    types::{NotThreadSafe, Opaque},
+    sync::aref::{
+        ARef,
+        RefCounted, //
+    },
+    types::{
+        AlwaysRefCounted,
+        NotThreadSafe,
+        Opaque, //
+    }, //
 };
 use core::ptr;
 
@@ -197,7 +204,7 @@ unsafe impl Sync for File {}
 
 // SAFETY: The type invariants guarantee that `File` is always ref-counted. This implementation
 // makes `ARef<File>` own a normal refcount.
-unsafe impl AlwaysRefCounted for File {
+unsafe impl RefCounted for File {
     #[inline]
     fn inc_ref(&self) {
         // SAFETY: The existence of a shared reference means that the refcount is nonzero.
@@ -212,6 +219,10 @@ unsafe fn dec_ref(obj: ptr::NonNull<File>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<File>` from a
+// `&File`.
+unsafe impl AlwaysRefCounted for File {}
+
 /// Wraps the kernel's `struct file`. Not thread safe.
 ///
 /// This type represents a file that is not known to be safe to transfer across thread boundaries.
@@ -233,7 +244,7 @@ pub struct LocalFile {
 
 // SAFETY: The type invariants guarantee that `LocalFile` is always ref-counted. This implementation
 // makes `ARef<LocalFile>` own a normal refcount.
-unsafe impl AlwaysRefCounted for LocalFile {
+unsafe impl RefCounted for LocalFile {
     #[inline]
     fn inc_ref(&self) {
         // SAFETY: The existence of a shared reference means that the refcount is nonzero.
@@ -249,6 +260,10 @@ unsafe fn dec_ref(obj: ptr::NonNull<LocalFile>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<LocalFile>` from a
+// `&LocalFile`.
+unsafe impl AlwaysRefCounted for LocalFile {}
+
 impl LocalFile {
     /// Constructs a new `struct file` wrapper from a file descriptor.
     ///
diff --git a/rust/kernel/i2c.rs b/rust/kernel/i2c.rs
index 624b971ca8b0b..02b2c9220eb11 100644
--- a/rust/kernel/i2c.rs
+++ b/rust/kernel/i2c.rs
@@ -18,7 +18,8 @@
     prelude::*,
     sync::aref::{
         ARef,
-        AlwaysRefCounted, //
+        AlwaysRefCounted,
+        RefCounted, //
     },
     types::Opaque, //
 };
@@ -424,7 +425,7 @@ pub fn get(index: i32) -> Result<ARef<Self>> {
 kernel::impl_device_context_into_aref!(I2cAdapter);
 
 // SAFETY: Instances of `I2cAdapter` are always reference-counted.
-unsafe impl AlwaysRefCounted for I2cAdapter {
+unsafe impl RefCounted for I2cAdapter {
     fn inc_ref(&self) {
         // SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.
         unsafe { bindings::i2c_get_adapter(self.index()) };
@@ -435,6 +436,9 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
         unsafe { bindings::i2c_put_adapter(obj.as_ref().as_raw()) }
     }
 }
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from an
+// `&I2cAdapter`.
+unsafe impl AlwaysRefCounted for I2cAdapter {}
 
 /// The i2c board info representation
 ///
@@ -500,7 +504,7 @@ unsafe impl<Ctx: device::DeviceContext> device::AsBusDevice<Ctx> for I2cClient<C
 kernel::impl_device_context_into_aref!(I2cClient);
 
 // SAFETY: Instances of `I2cClient` are always reference-counted.
-unsafe impl AlwaysRefCounted for I2cClient {
+unsafe impl RefCounted for I2cClient {
     fn inc_ref(&self) {
         // SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.
         unsafe { bindings::get_device(self.as_ref().as_raw()) };
@@ -511,6 +515,9 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
         unsafe { bindings::put_device(&raw mut (*obj.as_ref().as_raw()).dev) }
     }
 }
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from an
+// `&I2cClient`.
+unsafe impl AlwaysRefCounted for I2cClient {}
 
 impl<Ctx: device::DeviceContext> AsRef<device::Device<Ctx>> for I2cClient<Ctx> {
     fn as_ref(&self) -> &device::Device<Ctx> {
diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
index 4764d7b68f2a7..83ed94fca14ca 100644
--- a/rust/kernel/mm.rs
+++ b/rust/kernel/mm.rs
@@ -13,8 +13,15 @@
 
 use crate::{
     bindings,
-    sync::aref::{ARef, AlwaysRefCounted},
-    types::{NotThreadSafe, Opaque},
+    sync::aref::{
+        ARef,
+        RefCounted, //
+    },
+    types::{
+        AlwaysRefCounted,
+        NotThreadSafe,
+        Opaque, //
+    }, //
 };
 use core::{ops::Deref, ptr::NonNull};
 
@@ -55,7 +62,7 @@ unsafe impl Send for Mm {}
 unsafe impl Sync for Mm {}
 
 // SAFETY: By the type invariants, this type is always refcounted.
-unsafe impl AlwaysRefCounted for Mm {
+unsafe impl RefCounted for Mm {
     #[inline]
     fn inc_ref(&self) {
         // SAFETY: The pointer is valid since self is a reference.
@@ -69,6 +76,9 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Mm>` from a `&Mm`.
+unsafe impl AlwaysRefCounted for Mm {}
+
 /// A wrapper for the kernel's `struct mm_struct`.
 ///
 /// This type is like [`Mm`], but with non-zero `mm_users`. It can only be used when `mm_users` can
@@ -91,7 +101,7 @@ unsafe impl Send for MmWithUser {}
 unsafe impl Sync for MmWithUser {}
 
 // SAFETY: By the type invariants, this type is always refcounted.
-unsafe impl AlwaysRefCounted for MmWithUser {
+unsafe impl RefCounted for MmWithUser {
     #[inline]
     fn inc_ref(&self) {
         // SAFETY: The pointer is valid since self is a reference.
@@ -105,6 +115,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<MmWithUser>` from a
+// `&MmWithUser`.
+unsafe impl AlwaysRefCounted for MmWithUser {}
+
 // Make all `Mm` methods available on `MmWithUser`.
 impl Deref for MmWithUser {
     type Target = Mm;
diff --git a/rust/kernel/mm/mmput_async.rs b/rust/kernel/mm/mmput_async.rs
index b8d2f051225c7..8fbc396e46028 100644
--- a/rust/kernel/mm/mmput_async.rs
+++ b/rust/kernel/mm/mmput_async.rs
@@ -10,7 +10,11 @@
 use crate::{
     bindings,
     mm::MmWithUser,
-    sync::aref::{ARef, AlwaysRefCounted},
+    sync::aref::{
+        ARef,
+        RefCounted, //
+    },
+    types::AlwaysRefCounted,
 };
 use core::{ops::Deref, ptr::NonNull};
 
@@ -34,7 +38,7 @@ unsafe impl Send for MmWithUserAsync {}
 unsafe impl Sync for MmWithUserAsync {}
 
 // SAFETY: By the type invariants, this type is always refcounted.
-unsafe impl AlwaysRefCounted for MmWithUserAsync {
+unsafe impl RefCounted for MmWithUserAsync {
     #[inline]
     fn inc_ref(&self) {
         // SAFETY: The pointer is valid since self is a reference.
@@ -48,6 +52,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<MmWithUserAsync>`
+// from a `&MmWithUserAsync`.
+unsafe impl AlwaysRefCounted for MmWithUserAsync {}
+
 // Make all `MmWithUser` methods available on `MmWithUserAsync`.
 impl Deref for MmWithUserAsync {
     type Target = MmWithUser;
diff --git a/rust/kernel/opp.rs b/rust/kernel/opp.rs
index 62e44676125d1..b8db6bdefd077 100644
--- a/rust/kernel/opp.rs
+++ b/rust/kernel/opp.rs
@@ -16,8 +16,14 @@
     ffi::{c_char, c_ulong},
     prelude::*,
     str::CString,
-    sync::aref::{ARef, AlwaysRefCounted},
-    types::Opaque,
+    sync::aref::{
+        ARef,
+        RefCounted, //
+    },
+    types::{
+        AlwaysRefCounted,
+        Opaque, //
+    }, //
 };
 
 #[cfg(CONFIG_CPU_FREQ)]
@@ -1041,7 +1047,7 @@ unsafe impl Send for OPP {}
 unsafe impl Sync for OPP {}
 
 /// SAFETY: The type invariants guarantee that [`OPP`] is always refcounted.
-unsafe impl AlwaysRefCounted for OPP {
+unsafe impl RefCounted for OPP {
     #[inline]
     fn inc_ref(&self) {
         // SAFETY: The existence of a shared reference means that the refcount is nonzero.
@@ -1055,6 +1061,10 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<OPP>` from an
+// `&OPP`.
+unsafe impl AlwaysRefCounted for OPP {}
+
 impl OPP {
     /// Creates an owned reference to a [`OPP`] from a valid pointer.
     ///
diff --git a/rust/kernel/owned.rs b/rust/kernel/owned.rs
index 9c92d4a83cc1b..e79936c00002c 100644
--- a/rust/kernel/owned.rs
+++ b/rust/kernel/owned.rs
@@ -27,7 +27,7 @@
 ///
 /// Note: The underlying object is not required to provide internal reference counting, because it
 /// represents a unique, owned reference. If reference counting (on the Rust side) is required,
-/// [`AlwaysRefCounted`](crate::sync::aref::AlwaysRefCounted) should be implemented.
+/// [`RefCounted`](crate::types::RefCounted) should be implemented.
 ///
 /// # Examples
 ///
diff --git a/rust/kernel/pci.rs b/rust/kernel/pci.rs
index 5071cae6543fd..ea9ef99cecb07 100644
--- a/rust/kernel/pci.rs
+++ b/rust/kernel/pci.rs
@@ -19,6 +19,10 @@
     },
     prelude::*,
     str::CStr,
+    sync::aref::{
+        AlwaysRefCounted,
+        RefCounted, //
+    },
     types::Opaque,
     ThisModule, //
 };
@@ -481,7 +485,7 @@ unsafe impl<Ctx: device::DeviceContext> device::AsBusDevice<Ctx> for Device<Ctx>
 impl<'a> crate::dma::Device<'a> for Device<device::Core<'a>> {}
 
 // SAFETY: Instances of `Device` are always reference-counted.
-unsafe impl crate::sync::aref::AlwaysRefCounted for Device {
+unsafe impl RefCounted for Device {
     fn inc_ref(&self) {
         // SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.
         unsafe { bindings::pci_dev_get(self.as_raw()) };
@@ -493,6 +497,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from a
+// `&Device`.
+unsafe impl AlwaysRefCounted for Device {}
+
 impl<Ctx: device::DeviceContext> AsRef<device::Device<Ctx>> for Device<Ctx> {
     fn as_ref(&self) -> &device::Device<Ctx> {
         // SAFETY: By the type invariant of `Self`, `self.as_raw()` is a pointer to a valid
diff --git a/rust/kernel/pid_namespace.rs b/rust/kernel/pid_namespace.rs
index 979a9718f153d..067f68b99e8c5 100644
--- a/rust/kernel/pid_namespace.rs
+++ b/rust/kernel/pid_namespace.rs
@@ -7,7 +7,14 @@
 //! C header: [`include/linux/pid_namespace.h`](srctree/include/linux/pid_namespace.h) and
 //! [`include/linux/pid.h`](srctree/include/linux/pid.h)
 
-use crate::{bindings, sync::aref::AlwaysRefCounted, types::Opaque};
+use crate::{
+    bindings,
+    sync::aref::RefCounted,
+    types::{
+        AlwaysRefCounted,
+        Opaque, //
+    }, //
+};
 use core::ptr;
 
 /// Wraps the kernel's `struct pid_namespace`. Thread safe.
@@ -41,7 +48,7 @@ pub unsafe fn from_ptr<'a>(ptr: *const bindings::pid_namespace) -> &'a Self {
 }
 
 // SAFETY: Instances of `PidNamespace` are always reference-counted.
-unsafe impl AlwaysRefCounted for PidNamespace {
+unsafe impl RefCounted for PidNamespace {
     #[inline]
     fn inc_ref(&self) {
         // SAFETY: The existence of a shared reference means that the refcount is nonzero.
@@ -55,6 +62,10 @@ unsafe fn dec_ref(obj: ptr::NonNull<PidNamespace>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<PidNamespace>` from
+// a `&PidNamespace`.
+unsafe impl AlwaysRefCounted for PidNamespace {}
+
 // SAFETY:
 // - `PidNamespace::dec_ref` can be called from any thread.
 // - It is okay to send ownership of `PidNamespace` across thread boundaries.
diff --git a/rust/kernel/platform.rs b/rust/kernel/platform.rs
index 9b362e0495d32..0ba676445b06d 100644
--- a/rust/kernel/platform.rs
+++ b/rust/kernel/platform.rs
@@ -27,6 +27,10 @@
     },
     of,
     prelude::*,
+    sync::aref::{
+        AlwaysRefCounted,
+        RefCounted, //
+    },
     types::Opaque,
     ThisModule, //
 };
@@ -518,7 +522,7 @@ pub fn optional_irq_by_name(&self, name: &CStr) -> Result<IrqRequest<'_>> {
 impl<'a> crate::dma::Device<'a> for Device<device::Core<'a>> {}
 
 // SAFETY: Instances of `Device` are always reference-counted.
-unsafe impl crate::sync::aref::AlwaysRefCounted for Device {
+unsafe impl RefCounted for Device {
     fn inc_ref(&self) {
         // SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.
         unsafe { bindings::get_device(self.as_ref().as_raw()) };
@@ -530,6 +534,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from a
+// `&Device`.
+unsafe impl AlwaysRefCounted for Device {}
+
 impl<Ctx: device::DeviceContext> AsRef<device::Device<Ctx>> for Device<Ctx> {
     fn as_ref(&self) -> &device::Device<Ctx> {
         // SAFETY: By the type invariant of `Self`, `self.as_raw()` is a pointer to a valid
diff --git a/rust/kernel/pwm.rs b/rust/kernel/pwm.rs
index 6c9d667009ef7..2d1cd74dd98e1 100644
--- a/rust/kernel/pwm.rs
+++ b/rust/kernel/pwm.rs
@@ -13,7 +13,11 @@
     devres,
     error::{self, to_result},
     prelude::*,
-    sync::aref::{ARef, AlwaysRefCounted},
+    sync::aref::{
+        ARef,
+        AlwaysRefCounted,
+        RefCounted, //
+    },
     types::Opaque, //
 };
 use core::{
@@ -629,7 +633,7 @@ pub fn new<'a>(
 }
 
 // SAFETY: Implements refcounting for `Chip` using the embedded `struct device`.
-unsafe impl<T: PwmOps> AlwaysRefCounted for Chip<T> {
+unsafe impl<T: PwmOps> RefCounted for Chip<T> {
     #[inline]
     fn inc_ref(&self) {
         // SAFETY: `self.0.get()` points to a valid `pwm_chip` because `self` exists.
@@ -647,6 +651,10 @@ unsafe fn dec_ref(obj: NonNull<Chip<T>>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Chip<T>>` from a
+// `&Chip<T>`.
+unsafe impl<T: PwmOps> AlwaysRefCounted for Chip<T> {}
+
 // SAFETY: `Chip` is a wrapper around `*mut bindings::pwm_chip`. The underlying C
 // structure's state is managed and synchronized by the kernel's device model
 // and PWM core locking mechanisms. Therefore, it is safe to move the `Chip`
diff --git a/rust/kernel/sync/aref.rs b/rust/kernel/sync/aref.rs
index 3bd5eb8a1a526..fb7466a362741 100644
--- a/rust/kernel/sync/aref.rs
+++ b/rust/kernel/sync/aref.rs
@@ -24,11 +24,9 @@
     ptr::NonNull, //
 };
 
-/// Types that are _always_ reference counted.
+/// Types that are internally reference counted.
 ///
 /// It allows such types to define their own custom ref increment and decrement functions.
-/// Additionally, it allows users to convert from a shared reference `&T` to an owned reference
-/// [`ARef<T>`].
 ///
 /// This is usually implemented by wrappers to existing structures on the C side of the code. For
 /// Rust code, the recommendation is to use [`Arc`](crate::sync::Arc) to create reference-counted
@@ -45,9 +43,8 @@
 /// at least until matching decrements are performed.
 ///
 /// Implementers must also ensure that all instances are reference-counted. (Otherwise they
-/// won't be able to honour the requirement that [`AlwaysRefCounted::inc_ref`] keep the object
-/// alive.)
-pub unsafe trait AlwaysRefCounted {
+/// won't be able to honour the requirement that [`RefCounted::inc_ref`] keep the object alive.)
+pub unsafe trait RefCounted {
     /// Increments the reference count on the object.
     fn inc_ref(&self);
 
@@ -60,11 +57,27 @@ pub unsafe trait AlwaysRefCounted {
     /// Callers must ensure that there was a previous matching increment to the reference count,
     /// and that the object is no longer used after its reference count is decremented (as it may
     /// result in the object being freed), unless the caller owns another increment on the refcount
-    /// (e.g., it calls [`AlwaysRefCounted::inc_ref`] twice, then calls
-    /// [`AlwaysRefCounted::dec_ref`] once).
+    /// (e.g., it calls [`RefCounted::inc_ref`] twice, then calls [`RefCounted::dec_ref`] once).
     unsafe fn dec_ref(obj: NonNull<Self>);
 }
 
+/// Always reference-counted type.
+///
+/// It allows deriving a counted reference [`ARef<T>`] from a `&T`.
+///
+/// This provides some convenience, but it allows "escaping" borrow checks on `&T`. As it
+/// complicates attempts to ensure that a reference to T is unique, it is optional to provide for
+/// [`RefCounted`] types. See *Safety* below.
+///
+/// # Safety
+///
+/// Implementers must ensure that no safety invariants are violated by upgrading an `&T` to an
+/// [`ARef<T>`]. In particular that implies [`AlwaysRefCounted`] and [`crate::types::Ownable`]
+/// cannot be implemented for the same type, as this would allow violating the uniqueness guarantee
+/// of [`crate::types::Owned<T>`] by dereferencing it into an `&T` and obtaining an [`ARef`] from
+/// that.
+pub unsafe trait AlwaysRefCounted: RefCounted {}
+
 /// An owned reference to an always-reference-counted object.
 ///
 /// The object's reference count is automatically decremented when an instance of [`ARef`] is
@@ -75,7 +88,7 @@ pub unsafe trait AlwaysRefCounted {
 ///
 /// The pointer stored in `ptr` is non-null and valid for the lifetime of the [`ARef`] instance. In
 /// particular, the [`ARef`] instance owns an increment on the underlying object's reference count.
-pub struct ARef<T: AlwaysRefCounted> {
+pub struct ARef<T: RefCounted> {
     ptr: NonNull<T>,
     _p: PhantomData<T>,
 }
@@ -84,19 +97,19 @@ pub struct ARef<T: AlwaysRefCounted> {
 // it effectively means sharing `&T` (which is safe because `T` is `Sync`); additionally, it needs
 // `T` to be `Send` because any thread that has an `ARef<T>` may ultimately access `T` using a
 // mutable reference, for example, when the reference count reaches zero and `T` is dropped.
-unsafe impl<T: AlwaysRefCounted + Sync + Send> Send for ARef<T> {}
+unsafe impl<T: RefCounted + Sync + Send> Send for ARef<T> {}
 
 // SAFETY: It is safe to send `&ARef<T>` to another thread when the underlying `T` is `Sync`
 // because it effectively means sharing `&T` (which is safe because `T` is `Sync`); additionally,
 // it needs `T` to be `Send` because any thread that has a `&ARef<T>` may clone it and get an
 // `ARef<T>` on that thread, so the thread may ultimately access `T` using a mutable reference, for
 // example, when the reference count reaches zero and `T` is dropped.
-unsafe impl<T: AlwaysRefCounted + Sync + Send> Sync for ARef<T> {}
+unsafe impl<T: RefCounted + Sync + Send> Sync for ARef<T> {}
 
 // Even if `T` is pinned, pointers to `T` can still move.
-impl<T: AlwaysRefCounted> Unpin for ARef<T> {}
+impl<T: RefCounted> Unpin for ARef<T> {}
 
-impl<T: AlwaysRefCounted> ARef<T> {
+impl<T: RefCounted> ARef<T> {
     /// Creates a new instance of [`ARef`].
     ///
     /// It takes over an increment of the reference count on the underlying object.
@@ -125,12 +138,12 @@ pub unsafe fn from_raw(ptr: NonNull<T>) -> Self {
     ///
     /// ```
     /// use core::ptr::NonNull;
-    /// use kernel::sync::aref::{ARef, AlwaysRefCounted};
+    /// use kernel::sync::aref::{ARef, RefCounted};
     ///
     /// struct Empty {}
     ///
     /// # // SAFETY: TODO.
-    /// unsafe impl AlwaysRefCounted for Empty {
+    /// unsafe impl RefCounted for Empty {
     ///     fn inc_ref(&self) {}
     ///     unsafe fn dec_ref(_obj: NonNull<Self>) {}
     /// }
@@ -148,7 +161,7 @@ pub fn into_raw(me: Self) -> NonNull<T> {
     }
 }
 
-impl<T: AlwaysRefCounted> Clone for ARef<T> {
+impl<T: RefCounted> Clone for ARef<T> {
     fn clone(&self) -> Self {
         self.inc_ref();
         // SAFETY: We just incremented the refcount above.
@@ -156,7 +169,7 @@ fn clone(&self) -> Self {
     }
 }
 
-impl<T: AlwaysRefCounted> Deref for ARef<T> {
+impl<T: RefCounted> Deref for ARef<T> {
     type Target = T;
 
     fn deref(&self) -> &Self::Target {
@@ -173,7 +186,7 @@ fn from(b: &T) -> Self {
     }
 }
 
-impl<T: AlwaysRefCounted> Drop for ARef<T> {
+impl<T: RefCounted> Drop for ARef<T> {
     fn drop(&mut self) {
         // SAFETY: The type invariants guarantee that the `ARef` owns the reference we're about to
         // decrement.
@@ -183,19 +196,19 @@ fn drop(&mut self) {
 
 impl<T, U> PartialEq<ARef<U>> for ARef<T>
 where
-    T: AlwaysRefCounted + PartialEq<U>,
-    U: AlwaysRefCounted,
+    T: RefCounted + PartialEq<U>,
+    U: RefCounted,
 {
     #[inline]
     fn eq(&self, other: &ARef<U>) -> bool {
         T::eq(&**self, &**other)
     }
 }
-impl<T: AlwaysRefCounted + Eq> Eq for ARef<T> {}
+impl<T: RefCounted + Eq> Eq for ARef<T> {}
 
 impl<T, U> PartialEq<&'_ U> for ARef<T>
 where
-    T: AlwaysRefCounted + PartialEq<U>,
+    T: RefCounted + PartialEq<U>,
 {
     #[inline]
     fn eq(&self, other: &&U) -> bool {
diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
index 38273f4eedb51..6259430b0ca31 100644
--- a/rust/kernel/task.rs
+++ b/rust/kernel/task.rs
@@ -10,7 +10,12 @@
     pid_namespace::PidNamespace,
     prelude::*,
     sync::aref::ARef,
-    types::{NotThreadSafe, Opaque},
+    types::{
+        AlwaysRefCounted,
+        NotThreadSafe,
+        Opaque,
+        RefCounted, //
+    },
 };
 use core::{
     ops::Deref,
@@ -347,7 +352,7 @@ pub fn group_leader(&self) -> &Task {
 }
 
 // SAFETY: The type invariants guarantee that `Task` is always refcounted.
-unsafe impl crate::sync::aref::AlwaysRefCounted for Task {
+unsafe impl RefCounted for Task {
     #[inline]
     fn inc_ref(&self) {
         // SAFETY: The existence of a shared reference means that the refcount is nonzero.
@@ -361,6 +366,10 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Task>` from a
+// `&Task`.
+unsafe impl AlwaysRefCounted for Task {}
+
 impl PartialEq for Task {
     #[inline]
     fn eq(&self, other: &Self) -> bool {
diff --git a/rust/kernel/types.rs b/rust/kernel/types.rs
index c41eab0ec983c..5ef763717e59a 100644
--- a/rust/kernel/types.rs
+++ b/rust/kernel/types.rs
@@ -15,9 +15,15 @@
 pub mod for_lt;
 pub use for_lt::ForLt;
 
-pub use crate::owned::{
-    Ownable,
-    Owned, //
+pub use crate::{
+    owned::{
+        Ownable,
+        Owned, //
+    },
+    sync::aref::{
+        AlwaysRefCounted,
+        RefCounted, //
+    }, //
 };
 
 /// Used to transfer ownership to and from foreign (non-Rust) languages.
diff --git a/rust/kernel/usb.rs b/rust/kernel/usb.rs
index 7aff0c82d0afc..59350c6b0df2a 100644
--- a/rust/kernel/usb.rs
+++ b/rust/kernel/usb.rs
@@ -18,7 +18,10 @@
         to_result, //
     },
     prelude::*,
-    sync::aref::AlwaysRefCounted,
+    sync::aref::{
+        AlwaysRefCounted,
+        RefCounted, //
+    },
     types::Opaque,
     ThisModule, //
 };
@@ -392,7 +395,7 @@ fn as_ref(&self) -> &Device {
 }
 
 // SAFETY: Instances of `Interface` are always reference-counted.
-unsafe impl AlwaysRefCounted for Interface {
+unsafe impl RefCounted for Interface {
     fn inc_ref(&self) {
         // SAFETY: The invariants of `Interface` guarantee that `self.as_raw()`
         // returns a valid `struct usb_interface` pointer, for which we will
@@ -406,6 +409,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Interface>` from a
+// `&Interface`.
+unsafe impl AlwaysRefCounted for Interface {}
+
 // SAFETY: A `Interface` is always reference-counted and can be released from any thread.
 unsafe impl Send for Interface {}
 
@@ -443,7 +450,7 @@ fn as_raw(&self) -> *mut bindings::usb_device {
 kernel::impl_device_context_into_aref!(Device);
 
 // SAFETY: Instances of `Device` are always reference-counted.
-unsafe impl AlwaysRefCounted for Device {
+unsafe impl RefCounted for Device {
     fn inc_ref(&self) {
         // SAFETY: The invariants of `Device` guarantee that `self.as_raw()`
         // returns a valid `struct usb_device` pointer, for which we will
@@ -457,6 +464,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
     }
 }
 
+// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from a
+// `&Device`.
+unsafe impl AlwaysRefCounted for Device {}
+
 impl<Ctx: device::DeviceContext> AsRef<device::Device<Ctx>> for Device<Ctx> {
     fn as_ref(&self) -> &device::Device<Ctx> {
         // SAFETY: By the type invariant of `Self`, `self.as_raw()` is a pointer to a valid

-- 
2.51.2



^ permalink raw reply related

* [PATCH v18 1/8] rust: alloc: add `KBox::into_non_null`
From: Andreas Hindborg @ 2026-06-25 10:15 UTC (permalink / raw)
  To: Danilo Krummrich, Lorenzo Stoakes, Vlastimil Babka,
	Liam R. Howlett, Uladzislau Rezki, Miguel Ojeda, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Alice Ryhl,
	Trevor Gross, Daniel Almeida, Tamir Duberstein, Alexandre Courbot,
	Onur Özkan, Lyude Paul, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Rafael J. Wysocki, Dave Ertman, Ira Weiny,
	Leon Romanovsky, Paul Moore, Serge Hallyn, David Airlie,
	Simona Vetter, Alexander Viro, Jan Kara, Igor Korotin,
	Viresh Kumar, Nishanth Menon, Stephen Boyd, Bjorn Helgaas,
	Krzysztof Wilczyński, Pavel Tikhomirov, Michal Wilczynski
  Cc: Andreas Hindborg, Philipp Stanner, rust-for-linux, linux-kernel,
	linux-mm, driver-core, linux-block, linux-security-module,
	dri-devel, linux-fsdevel, linux-pm, linux-pci, linux-pwm
In-Reply-To: <20260625-unique-ref-v18-0-4e06b5896d47@kernel.org>

Add a method to consume a `Box<T, A>` and return a `NonNull<T>`. This
is a convenience wrapper around `Self::into_raw` for callers that need
a `NonNull` pointer rather than a raw pointer.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
---
 rust/kernel/alloc/kbox.rs | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/rust/kernel/alloc/kbox.rs b/rust/kernel/alloc/kbox.rs
index 35d1e015848dd..d534e8adcf7b3 100644
--- a/rust/kernel/alloc/kbox.rs
+++ b/rust/kernel/alloc/kbox.rs
@@ -211,6 +211,15 @@ pub fn leak<'a>(b: Self) -> &'a mut T {
         // which points to an initialized instance of `T`.
         unsafe { &mut *Box::into_raw(b) }
     }
+
+    /// Consumes the `Box<T,A>` and returns a `NonNull<T>`.
+    ///
+    /// Like [`Self::into_raw`], but returns a `NonNull`.
+    #[inline]
+    pub fn into_non_null(b: Self) -> NonNull<T> {
+        // SAFETY: `KBox::into_raw` returns a valid pointer.
+        unsafe { NonNull::new_unchecked(Self::into_raw(b)) }
+    }
 }
 
 impl<T, A> Box<MaybeUninit<T>, A>

-- 
2.51.2



^ permalink raw reply related

* [PATCH v18 3/8] rust: implement `ForeignOwnable` for `Owned`
From: Andreas Hindborg @ 2026-06-25 10:15 UTC (permalink / raw)
  To: Danilo Krummrich, Lorenzo Stoakes, Vlastimil Babka,
	Liam R. Howlett, Uladzislau Rezki, Miguel Ojeda, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Alice Ryhl,
	Trevor Gross, Daniel Almeida, Tamir Duberstein, Alexandre Courbot,
	Onur Özkan, Lyude Paul, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Rafael J. Wysocki, Dave Ertman, Ira Weiny,
	Leon Romanovsky, Paul Moore, Serge Hallyn, David Airlie,
	Simona Vetter, Alexander Viro, Jan Kara, Igor Korotin,
	Viresh Kumar, Nishanth Menon, Stephen Boyd, Bjorn Helgaas,
	Krzysztof Wilczyński, Pavel Tikhomirov, Michal Wilczynski
  Cc: Andreas Hindborg, Philipp Stanner, rust-for-linux, linux-kernel,
	linux-mm, driver-core, linux-block, linux-security-module,
	dri-devel, linux-fsdevel, linux-pm, linux-pci, linux-pwm
In-Reply-To: <20260625-unique-ref-v18-0-4e06b5896d47@kernel.org>

Implement `ForeignOwnable` for `Owned<T>`. This allows use of `Owned<T>` in
places such as the `XArray`.

Note that `T` does not need to implement `ForeignOwnable` for `Owned<T>` to
implement `ForeignOwnable`.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 rust/kernel/owned.rs | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/rust/kernel/owned.rs b/rust/kernel/owned.rs
index 7fe9ec3e55126..9c92d4a83cc1b 100644
--- a/rust/kernel/owned.rs
+++ b/rust/kernel/owned.rs
@@ -15,6 +15,8 @@
     ptr::NonNull, //
 };
 
+use kernel::types::ForeignOwnable;
+
 /// Types that specify their own way of performing allocation and destruction. Typically, this trait
 /// is implemented on types from the C side.
 ///
@@ -186,3 +188,54 @@ fn drop(&mut self) {
         unsafe { T::release(self.ptr) };
     }
 }
+
+// SAFETY: We derive the pointer to `T` from a valid `T`, so the returned
+// pointer satisfy alignment requirements of `T`.
+unsafe impl<T: Ownable> ForeignOwnable for Owned<T> {
+    const FOREIGN_ALIGN: usize = core::mem::align_of::<T>();
+
+    type Borrowed<'a>
+        = &'a T
+    where
+        Self: 'a;
+    type BorrowedMut<'a>
+        = Pin<&'a mut T>
+    where
+        Self: 'a;
+
+    #[inline]
+    fn into_foreign(self) -> *mut kernel::ffi::c_void {
+        let ptr = self.ptr.as_ptr().cast();
+        core::mem::forget(self);
+        ptr
+    }
+
+    #[inline]
+    unsafe fn from_foreign(ptr: *mut kernel::ffi::c_void) -> Self {
+        // INVARIANT: By the function safety contract, `ptr` was returned by `into_foreign`, which
+        // gave up exclusive ownership of a valid, pinned `T`; we retake that ownership here.
+        Self {
+            // SAFETY: By function safety contract, `ptr` came from
+            // `into_foreign` and cannot be null.
+            ptr: unsafe { NonNull::new_unchecked(ptr.cast()) },
+        }
+    }
+
+    #[inline]
+    unsafe fn borrow<'a>(ptr: *mut kernel::ffi::c_void) -> Self::Borrowed<'a> {
+        // SAFETY: By function safety requirements, `ptr` is valid for use as a
+        // reference for `'a`.
+        unsafe { &*ptr.cast() }
+    }
+
+    #[inline]
+    unsafe fn borrow_mut<'a>(ptr: *mut kernel::ffi::c_void) -> Self::BorrowedMut<'a> {
+        // SAFETY: By function safety requirements, `ptr` is valid for use as a
+        // unique reference for `'a`.
+        let inner = unsafe { &mut *ptr.cast() };
+
+        // SAFETY: We never move out of inner, and we do not hand out mutable
+        // references when `T: !Unpin`.
+        unsafe { Pin::new_unchecked(inner) }
+    }
+}

-- 
2.51.2



^ permalink raw reply related

* [PATCH v18 0/8] rust: add `Ownable` trait and `Owned` type
From: Andreas Hindborg @ 2026-06-25 10:15 UTC (permalink / raw)
  To: Danilo Krummrich, Lorenzo Stoakes, Vlastimil Babka,
	Liam R. Howlett, Uladzislau Rezki, Miguel Ojeda, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Alice Ryhl,
	Trevor Gross, Daniel Almeida, Tamir Duberstein, Alexandre Courbot,
	Onur Özkan, Lyude Paul, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Rafael J. Wysocki, Dave Ertman, Ira Weiny,
	Leon Romanovsky, Paul Moore, Serge Hallyn, David Airlie,
	Simona Vetter, Alexander Viro, Jan Kara, Igor Korotin,
	Viresh Kumar, Nishanth Menon, Stephen Boyd, Bjorn Helgaas,
	Krzysztof Wilczyński, Pavel Tikhomirov, Michal Wilczynski
  Cc: Andreas Hindborg, Philipp Stanner, rust-for-linux, linux-kernel,
	linux-mm, driver-core, linux-block, linux-security-module,
	dri-devel, linux-fsdevel, linux-pm, linux-pci, linux-pwm,
	Asahi Lina, Oliver Mangold, Viresh Kumar, Boqun Feng, Asahi Lina,
	Igor Korotin, Andreas Hindborg

Add a new trait `Ownable` and type `Owned` for types that specify their
own way of performing allocation and destruction. This is useful for
types from the C side.

Implement `ForeignOwnable` for `Owned`.

Convert `Page` to be `Ownable` and add a `from_raw` method.

Add the trait `OwnableRefCounted` that allows conversion between
`ARef` and `Owned`. This is analogous to conversion between `Arc` and
`UniqueArc`.

Patches 1-4 implement `Ownable` and applies it to `Page`. These patches
can be merged on their own.

Patches 5-7 add `Ownable` -> `ARef` interop and can be merged later if
consensus on their shape cannot be reached.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
Changes in v18:
- Rebase on `rust-next` (2026-06-24).
- Drop the `'static` bound on `ForeignOwnable for Owned` (Gary).
- Make `Ownable::release` take a raw pointer instead of `&mut self` (Alice, Sashiko).
- Drop `types::ARef` re-export (Alice).
- Drop unneeded `#[repr(transparent)]` on `Owned` (Gary).
- Fix `FOREIGN_ALIGN` for `Owned` to report the pointee alignment (Sashiko).
- Remove `BorrowedPage`; use `&Page` directly (Alice).
- Update Rust Binder for the `Owned<Page>` conversion (Alice).
- Update `pwm.rs` for the `RefCounted`/`AlwaysRefCounted` split (Sashiko).
- Fix documentation nits: missing `// INVARIANT:` comments, stale `Page` docs, and a stray `mut` (Sashiko).
- Expand the `use` statements touched by the rename patch to the multi-line style (Onur).
- Link to v17: https://msgid.link/20260604-unique-ref-v17-0-7b4c3d2930b9@kernel.org

Changes in v17:
- Rebase on v7.1-rc2.
- Reorder patches so that `Ownable` can merge without `OwnableRefCounted` (Alice).
- Add `#[inline]` directives to short functions added by the series (Gary).
- Link to v16: https://msgid.link/20260224-unique-ref-v16-0-c21afcb118d3@kernel.org

Changes in v16:
- Simplify pointer to reference cast in `Page::from_raw`.
- Use `NonNull<Page>` rather than `Owned<Page>` for `BorrowedPage` internals.
- Use "convertible to reference" wording when converting pointers to references.
- Fix formatting for `Page::from_raw` docs.
- Leave imports alone when adding safety comment to aref example.
- Use `KBox::into_nonnull` for examples.
- Add patch for `KBox::into_nonnull`.
- Change invariants and safety comments of `Ownable` and make the trait safe.
- Make `Ownable::release` take a mutable reference.
- Fix error handling in example for `Ownable`
- Link to v15: https://msgid.link/20260220-unique-ref-v15-0-893ed86b06cc@kernel.org

Changes in v15:
- Update series with original SoB's.
- Rename `AlwaysRefCounted` in `kernel::usb`.
- Rename `Owned::get_pin_mut` to `Owned::as_pin_mut`.
- Link to v14: https://msgid.link/20260204-unique-ref-v14-0-17cb29ebacbb@kernel.org

Changes in v14:
- Rebase on v6.19-rc7.
- Rewrite cover letter.
- Update documentation and safety comments based on v13 feedback.
- Update commit messages.
- Reorder implementation blocks in owned.rs.
- Update example in owned.rs to use try operator rather than `expect`.
- Reformat use statements.
- Add patch: rust: page: convert to `Ownable`.
- Add patch: rust: implement `ForeignOwnable` for `Owned`.
- Add patch: rust: page: add `from_raw()`.
- Link to v13: https://lore.kernel.org/r/20251117-unique-ref-v13-0-b5b243df1250@pm.me

Changes in v13:
- Rebase onto v6.18-rc1 (Andreas's work).
- Documentation and style fixes contributed by Andreas
- Link to v12: https://lore.kernel.org/r/20251001-unique-ref-v12-0-fa5c31f0c0c4@pm.me

Changes in v12:
-
- Rebase onto v6.17-rc1 (Andreas's work).
- moved kernel/types/ownable.rs to kernel/owned.rs
- Drop OwnableMut, make DerefMut depend on Unpin instead. I understood
  ML discussion as that being okay, but probably needs further scrunity.
- Lots of more documentation changes suggested by reviewers.
- Usage example for Ownable/Owned.
- Link to v11: https://lore.kernel.org/r/20250618-unique-ref-v11-0-49eadcdc0aa6@pm.me

Changes in v11:
- Rework of documentation. I tried to honor all requests for changes "in
  spirit" plus some clearifications and corrections of my own.
- Dropping `SimpleOwnedRefCounted` by request from Alice, as it creates a
  potentially problematic blanket implementation (which a derive macro that
  could be created later would not have).
- Dropping Miguel's "kbuild: provide `RUSTC_HAS_DO_NOT_RECOMMEND` symbol"
  patch, as it is not needed anymore after dropping `SimpleOwnedRefCounted`.
  (I can add it again, if it is considered useful anyway).
- Link to v10: https://lore.kernel.org/r/20250502-unique-ref-v10-0-25de64c0307f@pm.me

Changes in v10:
- Moved kernel/ownable.rs to kernel/types/ownable.rs
- Fixes in documentation / comments as suggested by Andreas Hindborg
- Added Reviewed-by comment for Andreas Hindborg
- Fix rustfmt of pid_namespace.rs
- Link to v9: https://lore.kernel.org/r/20250325-unique-ref-v9-0-e91618c1de26@pm.me

Changes in v9:
- Rebase onto v6.14-rc7
- Move Ownable/OwnedRefCounted/Ownable, etc., into separate module
- Documentation fixes to Ownable/OwnableMut/OwnableRefCounted
- Add missing SAFETY documentation to ARef example
- Link to v8: https://lore.kernel.org/r/20250313-unique-ref-v8-0-3082ffc67a31@pm.me

Changes in v8:
- Fix Co-developed-by and Suggested-by tags as suggested by Miguel and Boqun
- Some small documentation fixes in Owned/Ownable patch
- removing redundant trait constraint on DerefMut for Owned as suggested by Boqun Feng
- make SimpleOwnedRefCounted no longer implement RefCounted as suggested by Boqun Feng
- documentation for RefCounted as suggested by Boqun Feng
- Link to v7: https://lore.kernel.org/r/20250310-unique-ref-v7-0-4caddb78aa05@pm.me

Changes in v7:
- Squash patch to make Owned::from_raw/into_raw public into parent
- Added Signed-off-by to other people's commits
- Link to v6: https://lore.kernel.org/r/20250310-unique-ref-v6-0-1ff53558617e@pm.me

Changes in v6:
- Changed comments/formatting as suggested by Miguel Ojeda
- Included and used new config flag RUSTC_HAS_DO_NOT_RECOMMEND,
  thus no changes to types.rs will be needed when the attribute
  becomes available.
- Fixed commit message for Owned patch.
- Link to v5: https://lore.kernel.org/r/20250307-unique-ref-v5-0-bffeb633277e@pm.me

Changes in v5:
- Rebase the whole thing on top of the Ownable/Owned traits by Asahi Lina.
- Rename AlwaysRefCounted to RefCounted and make AlwaysRefCounted a
  marker trait instead to allow to obtain an ARef<T> from an &T,
  which (as Alice pointed out) is unsound when combined with UniqueRef/Owned.
- Change the Trait design and naming to implement this feature,
  UniqueRef/UniqueRefCounted is dropped in favor of Ownable/Owned and
  OwnableRefCounted is used to provide the functions to convert
  between Owned and ARef.
- Link to v4: https://lore.kernel.org/r/20250305-unique-ref-v4-1-a8fdef7b1c2c@pm.me

Changes in v4:
- Just a minor change in naming by request from Andreas Hindborg,
  try_shared_to_unique() -> try_from_shared(),
  unique_to_shared() -> into_shared(),
  which is more in line with standard Rust naming conventions.
- Link to v3: https://lore.kernel.org/r/Z8Wuud2UQX6Yukyr@mango

To: Danilo Krummrich <dakr@kernel.org>
To: Lorenzo Stoakes <ljs@kernel.org>
To: Vlastimil Babka <vbabka@kernel.org>
To: "Liam R. Howlett" <liam@infradead.org>
To: Uladzislau Rezki <urezki@gmail.com>
To: Miguel Ojeda <ojeda@kernel.org>
To: Boqun Feng <boqun@kernel.org>
To: Gary Guo <gary@garyguo.net>
To: Björn Roy Baron <bjorn3_gh@protonmail.com>
To: Benno Lossin <lossin@kernel.org>
To: Andreas Hindborg <a.hindborg@kernel.org>
To: Alice Ryhl <aliceryhl@google.com>
To: Trevor Gross <tmgross@umich.edu>
To: Daniel Almeida <daniel.almeida@collabora.com>
To: Tamir Duberstein <tamird@kernel.org>
To: Alexandre Courbot <acourbot@nvidia.com>
To: Onur Özkan <work@onurozkan.dev>
To: Lyude Paul <lyude@redhat.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Arve Hjønnevåg <arve@android.com>
To: Todd Kjos <tkjos@android.com>
To: Christian Brauner <brauner@kernel.org>
To: Carlos Llamas <cmllamas@google.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
To: Dave Ertman <david.m.ertman@intel.com>
To: Ira Weiny <ira.weiny@intel.com>
To: Leon Romanovsky <leon@kernel.org>
To: Paul Moore <paul@paul-moore.com>
To: Serge Hallyn <sergeh@kernel.org>
To: David Airlie <airlied@gmail.com>
To: Simona Vetter <simona@ffwll.ch>
To: Alexander Viro <viro@zeniv.linux.org.uk>
To: Jan Kara <jack@suse.cz>
To: Igor Korotin <igor.korotin@linux.dev>
To: Viresh Kumar <vireshk@kernel.org>
To: Nishanth Menon <nm@ti.com>
To: Stephen Boyd <sboyd@kernel.org>
To: Bjorn Helgaas <bhelgaas@google.com>
To: Krzysztof Wilczyński <kwilczynski@kernel.org>
To: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
To: Michal Wilczynski <m.wilczynski@samsung.com>
Cc: Philipp Stanner <phasta@kernel.org>
Cc: rust-for-linux@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: driver-core@lists.linux.dev
Cc: linux-block@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Cc: linux-pwm@vger.kernel.org

---
Andreas Hindborg (3):
      rust: alloc: add `KBox::into_non_null`
      rust: implement `ForeignOwnable` for `Owned`
      rust: page: add `from_raw()`

Asahi Lina (2):
      rust: types: Add Ownable/Owned types
      rust: page: convert to `Ownable`

Oliver Mangold (3):
      rust: rename `AlwaysRefCounted` to `RefCounted`.
      rust: Add missing SAFETY documentation for `ARef` example
      rust: Add `OwnableRefCounted`

 drivers/android/binder/page_range.rs |  10 +-
 rust/kernel/alloc/allocator.rs       |  19 +-
 rust/kernel/alloc/allocator/iter.rs  |   6 +-
 rust/kernel/alloc/kbox.rs            |   9 +
 rust/kernel/auxiliary.rs             |  10 +-
 rust/kernel/block/mq/request.rs      |  19 +-
 rust/kernel/cred.rs                  |  16 +-
 rust/kernel/device.rs                |  12 +-
 rust/kernel/device/property.rs       |  11 +-
 rust/kernel/drm/device.rs            |   9 +-
 rust/kernel/drm/gem/mod.rs           |  16 +-
 rust/kernel/fs/file.rs               |  23 ++-
 rust/kernel/i2c.rs                   |  13 +-
 rust/kernel/lib.rs                   |   1 +
 rust/kernel/mm.rs                    |  22 ++-
 rust/kernel/mm/mmput_async.rs        |  12 +-
 rust/kernel/opp.rs                   |  16 +-
 rust/kernel/owned.rs                 | 371 +++++++++++++++++++++++++++++++++++
 rust/kernel/page.rs                  | 136 +++++--------
 rust/kernel/pci.rs                   |  10 +-
 rust/kernel/pid_namespace.rs         |  15 +-
 rust/kernel/platform.rs              |  10 +-
 rust/kernel/pwm.rs                   |  12 +-
 rust/kernel/sync/aref.rs             |  82 +++++---
 rust/kernel/task.rs                  |  13 +-
 rust/kernel/types.rs                 |  12 ++
 rust/kernel/usb.rs                   |  17 +-
 27 files changed, 721 insertions(+), 181 deletions(-)
---
base-commit: 43a393185e33e573a374c1d4f7ddf6481484ef8d
change-id: 20250305-unique-ref-29fcd675f9e9

Best regards,
--  
Andreas Hindborg <a.hindborg@kernel.org>



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox