Linux block layer
 help / color / mirror / Atom feed
* [PATCH 02/16] blk-crypto: Fold __blk_crypto_cfg_supported() into its caller
From: Eric Biggers @ 2026-06-24  5:03 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-block,
	Christoph Hellwig, Theodore Ts'o, Andreas Dilger, Baokun Li,
	Jan Kara, Ojaswin Mujoo, Ritesh Harjani, Zhang Yi, Jaegeuk Kim,
	Chao Yu, Eric Biggers
In-Reply-To: <20260624050334.124606-1-ebiggers@kernel.org>

__blk_crypto_cfg_supported() is called only by
blk_crypto_config_supported_natively(), so fold it in.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 block/blk-crypto-profile.c | 22 ----------------------
 block/blk-crypto.c         | 23 +++++++++++++++++++++--
 2 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/block/blk-crypto-profile.c b/block/blk-crypto-profile.c
index cf447ba4a66e..53126c091b0b 100644
--- a/block/blk-crypto-profile.c
+++ b/block/blk-crypto-profile.c
@@ -333,32 +333,10 @@ void blk_crypto_put_keyslot(struct blk_crypto_keyslot *slot)
 		spin_unlock_irqrestore(&profile->idle_slots_lock, flags);
 		wake_up(&profile->idle_slots_wait_queue);
 	}
 }
 
-/**
- * __blk_crypto_cfg_supported() - Check whether the given crypto profile
- *				  supports the given crypto configuration.
- * @profile: the crypto profile to check
- * @cfg: the crypto configuration to check for
- *
- * Return: %true if @profile supports the given @cfg.
- */
-bool __blk_crypto_cfg_supported(struct blk_crypto_profile *profile,
-				const struct blk_crypto_config *cfg)
-{
-	if (!profile)
-		return false;
-	if (!(profile->modes_supported[cfg->crypto_mode] & cfg->data_unit_size))
-		return false;
-	if (profile->max_dun_bytes_supported < cfg->dun_bytes)
-		return false;
-	if (!(profile->key_types_supported & cfg->key_type))
-		return false;
-	return true;
-}
-
 /*
  * This is an internal function that evicts a key from an inline encryption
  * device that can be either a real device or the blk-crypto-fallback "device".
  * It is used only by blk_crypto_evict_key(); see that function for details.
  */
diff --git a/block/blk-crypto.c b/block/blk-crypto.c
index 15e25e41b166..dd83fc5af282 100644
--- a/block/blk-crypto.c
+++ b/block/blk-crypto.c
@@ -349,15 +349,34 @@ int blk_crypto_init_key(struct blk_crypto_key *blk_key,
 
 	return 0;
 }
 EXPORT_SYMBOL_GPL(blk_crypto_init_key);
 
+
+/**
+ * blk_crypto_config_supported_natively() - Check whether a block device
+ *					    supports hardware inline encryption
+ *					    with the given configuration.
+ * @bdev: the block device
+ * @cfg: the crypto configuration to check for
+ *
+ * Return: %true if @bdev supports hardware inline encryption with @cfg.
+ */
 bool blk_crypto_config_supported_natively(struct block_device *bdev,
 					  const struct blk_crypto_config *cfg)
 {
-	return __blk_crypto_cfg_supported(bdev_get_queue(bdev)->crypto_profile,
-					  cfg);
+	struct blk_crypto_profile *profile = bdev_get_queue(bdev)->crypto_profile;
+
+	if (!profile)
+		return false;
+	if (!(profile->modes_supported[cfg->crypto_mode] & cfg->data_unit_size))
+		return false;
+	if (profile->max_dun_bytes_supported < cfg->dun_bytes)
+		return false;
+	if (!(profile->key_types_supported & cfg->key_type))
+		return false;
+	return true;
 }
 
 /*
  * Check if bios with @cfg can be en/decrypted by blk-crypto (i.e. either the
  * block_device it's submitted to supports inline crypto, or the
-- 
2.54.0


^ permalink raw reply related

* [PATCH 01/16] blk-crypto: Simplify check for fallback support
From: Eric Biggers @ 2026-06-24  5:03 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-block,
	Christoph Hellwig, Theodore Ts'o, Andreas Dilger, Baokun Li,
	Jan Kara, Ojaswin Mujoo, Ritesh Harjani, Zhang Yi, Jaegeuk Kim,
	Chao Yu, Eric Biggers
In-Reply-To: <20260624050334.124606-1-ebiggers@kernel.org>

Since blk-crypto-fallback supports all blk_crypto_keys except wrapped
keys, just check for that condition directly instead of using
__blk_crypto_cfg_supported().  With this done,
__blk_crypto_cfg_supported() is now used only for the hardware support.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
 block/blk-crypto-fallback.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index 2a5c52ab74b4..2a8f40a65158 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -494,12 +494,11 @@ bool blk_crypto_fallback_bio_prep(struct bio *bio)
 		/* User didn't call blk_crypto_start_using_key() first */
 		bio_io_error(bio);
 		return false;
 	}
 
-	if (!__blk_crypto_cfg_supported(blk_crypto_fallback_profile,
-					&bc->bc_key->crypto_cfg)) {
+	if (bc->bc_key->crypto_cfg.key_type != BLK_CRYPTO_KEY_TYPE_RAW) {
 		bio_endio_status(bio, BLK_STS_NOTSUPP);
 		return false;
 	}
 
 	if (bio_data_dir(bio) == WRITE) {
-- 
2.54.0


^ permalink raw reply related

* [PATCH 00/16] fscrypt: Standardize on blk-crypto
From: Eric Biggers @ 2026-06-24  5:03 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-block,
	Christoph Hellwig, Theodore Ts'o, Andreas Dilger, Baokun Li,
	Jan Kara, Ojaswin Mujoo, Ritesh Harjani, Zhang Yi, Jaegeuk Kim,
	Chao Yu, Eric Biggers

This series can also be retrieved from:

    git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/ fscrypt-blk-crypto-v1

Currently, ext4 and f2fs (i.e., the block-based filesystems with fscrypt
support) have two file contents encryption implementations:

 - Filesystem-layer, where code in fs/crypto/ directly invokes
   crypto_skcipher to en/decrypt data using the CPU.  This
   implementation requires the management of bounce pages at the
   filesystem level.  It doesn't support direct I/O or large folios.

 - blk-crypto (also known as inline encryption), where the filesystem
   assigns bio_crypt_ctxs to bios, which are then processed either by
   the CPU using blk-crypto-fallback.c or by inline encryption hardware.
   This supports direct I/O and is compatible with large folios.

Currently, the latter implementation is enabled only when the
"inlinecrypt" mount option is given.

The persistence of the fs-layer implementation is mainly for historical
reasons, as it came first.  It's becoming increasingly hard to maintain,
especially as the filesystems get refactored to use iomap, large folios,
etc.  It's time to remove it and just rely on the similar code in
blk-crypto-fallback.  This series does that.

Some fs-layer encryption support remains in fs/crypto/ for non-block
based filesystems (UBIFS and CephFS), as well as directories and
symlinks.  So it's not entirely gone, but it's reduced.

To be clear, this just changes an internal implementation detail.  ext4
and f2fs continue to fully support encryption (fscrypt), regardless of
the presence of inline encryption hardware on the system.

Eric Biggers (16):
  blk-crypto: Simplify check for fallback support
  blk-crypto: Fold __blk_crypto_cfg_supported() into its caller
  blk-crypto: Allow control over whether hardware is used
  fscrypt: Fully disallow IV_INO_LBLK_32 with s_blocksize != PAGE_SIZE
  fscrypt: Always use blk-crypto for contents on block-based filesystems
  ext4: Remove fs-layer file contents en/decryption code
  ext4: Make ext4_bio_write_folio() return void
  ext4: Further de-generalize the bio postprocessing code
  f2fs: Remove fs-layer file contents en/decryption code
  fs/buffer: Remove fs-layer decryption code
  fscrypt: Replace calls to fscrypt_inode_uses_inline_crypto()
  fscrypt: Remove fscrypt_dio_supported()
  fscrypt: Remove fs-layer zeroout code
  fscrypt: Remove unused functions and workqueue
  fscrypt: Merge bio.c and inline_crypt.c into block.c
  fscrypt: Add safety checks to non-block-based en/decryption

 Documentation/filesystems/fscrypt.rst       |  39 ++-
 arch/loongarch/configs/loongson32_defconfig |   1 -
 arch/loongarch/configs/loongson64_defconfig |   1 -
 block/blk-crypto-fallback.c                 |   3 +-
 block/blk-crypto-profile.c                  |  22 --
 block/blk-crypto.c                          |  31 ++-
 drivers/md/dm-inlinecrypt.c                 |   2 +-
 fs/buffer.c                                 |  45 +---
 fs/crypto/Kconfig                           |   8 +-
 fs/crypto/Makefile                          |   3 +-
 fs/crypto/bio.c                             | 216 ---------------
 fs/crypto/{inline_crypt.c => block.c}       | 283 +++++++++-----------
 fs/crypto/crypto.c                          | 140 ++++------
 fs/crypto/fscrypt_private.h                 |  28 +-
 fs/crypto/keysetup.c                        |  31 +--
 fs/crypto/policy.c                          |  17 ++
 fs/ext4/crypto.c                            |   2 +-
 fs/ext4/ext4.h                              |   6 +-
 fs/ext4/inode.c                             |  64 +----
 fs/ext4/page-io.c                           |  74 +----
 fs/ext4/readpage.c                          | 140 +++-------
 fs/ext4/super.c                             |   6 +-
 fs/f2fs/compress.c                          |  28 +-
 fs/f2fs/data.c                              |  93 +------
 fs/f2fs/f2fs.h                              |   2 -
 fs/f2fs/file.c                              |   2 -
 fs/f2fs/segment.c                           |   2 -
 fs/f2fs/super.c                             |   2 +-
 include/linux/blk-crypto.h                  |   6 +-
 include/linux/fscrypt.h                     |  96 ++-----
 30 files changed, 357 insertions(+), 1036 deletions(-)
 delete mode 100644 fs/crypto/bio.c
 rename fs/crypto/{inline_crypt.c => block.c} (61%)


base-commit: 1dc18801be29bc54709aa355b8acd80e183b03cd
prerequisite-patch-id: 319d2891e88c7df1ebb5ebf434d18b68f770399f
prerequisite-patch-id: f6157c86deab0ff5ec953ae3ed6b0e84f37741bf
prerequisite-patch-id: 5330c9e4b65644baae81bd177a46be6223d2b494
prerequisite-patch-id: 073cb85332cc58e4b5066bf8f7ac948c0d9a2bac
prerequisite-patch-id: 4b1b7521df7ce7157156dbbc373c699060b21e3f
prerequisite-patch-id: edfd2a34a97697517828f233e478e5b7f8cf85c2
-- 
2.54.0


^ permalink raw reply

* blktests failures with v7.1 kernel
From: Shin'ichiro Kawasaki @ 2026-06-24  5:04 UTC (permalink / raw)
  To: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd, linux-rdma

Hi all,

I ran the latest blktests (git hash: 5a62429536b1) with the v7.1 kernel. I
observed 9 failures listed below. Comparing with the previous report for the
v7.1-rc1 kernel [1], 2 failures are avoided (nvme/045, scsi/002) and 3 failures
are new (block/005, nvme/062, nvme/063). As always, your help with fixes will be
appreciated.

[1] https://lore.kernel.org/linux-block/afB5syZbUrppgsDQ@shinmob/


List of failures
================
#1: block/005 (new)
#2: nvme/005 (tcp transport)
#3: nvme/058 (fc transport)(kmemleak)
#4: nvme/060 (rdma transport)
#5: nvme/061 (rdma transport, siw driver)(kmemleak)
#6: nvme/061 (fc transport)
#7: nvme/062 (tcp transport)(new)
#8: nvme/063 (tcp transport)(new)
#9: nbd/002


Failure description
===================

#1: block/005 (new)

    I found the test case block/005 failed under some conditions due to
    concurrent writes to a sysfs IO scheduler attribute. The failure was
    discussed and a fix patch candidate is available [2].

    [2] https://lore.kernel.org/linux-block/20260623013238.642052-1-shinichiro.kawasaki@wdc.com/

#2: nvme/005 (tcp transport)

    The test case nvme/005 fails for tcp transport due to the lockdep WARN
    related to the three locks q->q_usage_counter, q->elevator_lock and
    set->srcu. Refer to [1] for the details of the failure. There are two causes
    of the WARN and two fixes are required. One fix by Chaitanya is already in
    the kernel v7.1. The other fix is queued for v7.2-rc1 [3].

    [3] https://lore.kernel.org/linux-nvme/20260604023208.388157-1-shinichiro.kawasaki@wdc.com/

#3: nvme/058 (fc transport)(kmemleak)

    When the kernel enables CONFIG_DEBUG_KMEMLEAK, the test case sometimes
    causes kmemleak. With v7.1-rc1 kernel, this test case had caused a hang, but
    the hang is no longer observed with v7.1 kernel, so the kmemleak is easier
    to recreate now. The memory leak report with v7.1 kernel looks similar to
    those I reported for v7.0-rc1 kernel [4].

    [4] https://lore.kernel.org/linux-block/aZ_-cH8euZLySxdD@shinmob/

#4: nvme/060 (rdma transport)

    When the test case is repeated for rdma transport around 50 times, the test
    case fails. There are two failure symptoms and both do not look like kernel-
    side problems. I posted blktests side fix candidate patches [5].

    [5] https://lore.kernel.org/linux-nvme/20260619013329.558580-1-shinichiro.kawasaki@wdc.com/

#5: nvme/061 (rdma transport, siw driver)(kmemleak)

    When the test case nvme/061 is repeated twice for the rdma transport and the
    siw driver on the kernel with CONFIG_DEBUG_KMEMLEAK enabled, it causes
    kmemleak that is detected at the beginning of the 2nd run. Refer to the
    nvme/061 failure report for v6.19 kernel [6].

    [6] https://lore.kernel.org/linux-block/aY7ZBfMjVIhe_wh3@shinmob/

#6: nvme/061 (fc transport)

    When the test case nvme/061 is repeated around 50 times for the fc
    transport, the test process fails after Oops and KASAN null-ptr-deref.
    Refer to the report for the v7.0-rc1 kernel [4].

#7: nvme/062 (tcp transport)(new)

    The test case nvme/062 fails for tcp transport due to the lockdep WARN
    related to the three locks fs_reclaim, set->srcu and sk_lock-AF_INET-NVME.
    q->elevator_lock and q->q_usage_counter are also recorded in the lockdep
    splat [7].

    I ran nvme/062 on v7.1-rc1 kernel, and I observed it failed with lockdep
    WARN. In the past, I did not observe the failure of this test case because
    lockdep had been disabled due to the lockdep WARN at nvme/005. Now that
    nvme/005 no longer reports a lockdep WARN, I see it at nvme/062 instead.

    When I applied the fix patch for lockdep WARN at nvme/005 [3], the symptoms
    of the lockdep WARN changed [8]. With the patch, the three locks
    kernfs_rwsem, sparse_irq_lock and kernfs_supers_rwsem caused the WARN. The
    fix patch candidate for block/005 [2] did not affect the failure of
    nvme/062.

#8: nvme/063 (tcp transport)(new)

    The test case nvme/063 fails for tcp transport due to the lockdep WARN
    related to the three locks set->srcu, q->q_usage_counter(io) and
    q->elevator_lock [9].

    I had reported the failure of this test case on v7.1-rc1 kernel together
    with nvme/005, assuming the failures of nvme/005 and nvme/063 would have a
    single cause. But even applying the fix for nvme/005 [3], I still observe
    the failure of nvme/063. Therefore, this nvme/063 failure is a different
    problem from the nvme/005 failure. The fix patch candidate for block/005 [2]
    did not affect the failure of nvme/063 either.

#9: nbd/002

    The test case nbd/002 fails due to the lockdep WARN related to
    sk_lock-AF_INET6, cmd->lock and nsock->txlock. The lockdep WARN of this test
    case has been reported since v6.18-rc1 kernel [10]. Eric Dumazet posted a
    fix patch and it is queued for v7.2-rc1 [11]. I confirmed the patch avoids
    the failure. Thanks!

    [10] https://lore.kernel.org/linux-block/ynmi72x5wt5ooljjafebhcarit3pvu6axkslqenikb2p5txe57@ldytqa2t4i2x/
    [11] https://lore.kernel.org/linux-block/20260613042619.1108126-1-edumazet@google.com/


[7] nvme/062 dmesg on v7.1 kernel

[  271.544567] [   T1351] run blktests nvme/062 at 2026-06-24 10:47:29
[  271.810359] [   T1746] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  271.848916] [   T1749] nvmet: Allow non-TLS connections while TLS1.3 is enabled
[  271.869077] [   T1752] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  272.059895] [    T358] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  272.067526] [   T1759] nvme nvme5: creating 4 I/O queues.
[  272.073127] [   T1759] nvme nvme5: mapped 4/0/0 default/read/poll queues.
[  272.085957] [   T1759] nvme nvme5: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  272.648494] [   T1813] nvme nvme5: Removing ctrl: NQN "blktests-subsystem-1"

[  272.847561] [   T1822] ======================================================
[  272.848150] [   T1822] WARNING: possible circular locking dependency detected
[  272.848775] [   T1822] 7.1.0 #5 Not tainted
[  272.849107] [   T1822] ------------------------------------------------------
[  272.849704] [   T1822] tlshd/1822 is trying to acquire lock:
[  272.850161] [   T1822] ffffffff9b8ccda0 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc_cache_noprof+0x58/0x720
[  272.850908] [   T1822] 
                          but task is already holding lock:
[  272.851497] [   T1822] ffff8881220ad058 (sk_lock-AF_INET-NVME){+.+.}-{0:0}, at: do_tcp_setsockopt+0x499/0x26a0
[  272.852360] [   T1822] 
                          which lock already depends on the new lock.

[  272.853170] [   T1822] 
                          the existing dependency chain (in reverse order) is:
[  272.853857] [   T1822] 
                          -> #4 (sk_lock-AF_INET-NVME){+.+.}-{0:0}:
[  272.854483] [   T1822]        lock_sock_nested+0x32/0xf0
[  272.854883] [   T1822]        tcp_sendmsg+0x1c/0x50
[  272.855250] [   T1822]        sock_sendmsg+0x305/0x3c0
[  272.855588] [   T1822]        nvme_tcp_try_send_cmd_pdu+0x630/0xcf0 [nvme_tcp]
[  272.856098] [   T1822]        nvme_tcp_try_send+0x1ef/0xa60 [nvme_tcp]
[  272.856608] [   T1822]        nvme_tcp_queue_rq+0xfa3/0x19e0 [nvme_tcp]
[  272.857676] [   T1822]        blk_mq_dispatch_rq_list+0x3e0/0x2420
[  272.858698] [   T1822]        __blk_mq_sched_dispatch_requests+0x20a/0x15d0
[  272.859747] [   T1822]        blk_mq_sched_dispatch_requests+0xab/0x150
[  272.860785] [   T1822]        blk_mq_run_work_fn+0x135/0x2e0
[  272.861720] [   T1822]        process_one_work+0x8b6/0x1650
[  272.862619] [   T1822]        worker_thread+0x5fd/0xfe0
[  272.863473] [   T1822]        kthread+0x36a/0x460
[  272.864299] [   T1822]        ret_from_fork+0x655/0x9d0
[  272.865117] [   T1822]        ret_from_fork_asm+0x1a/0x30
[  272.865963] [   T1822] 
                          -> #3 (set->srcu){.+.+}-{0:0}:
[  272.867350] [   T1822]        __synchronize_srcu+0xe1/0x2f0
[  272.868170] [   T1822]        elevator_switch+0x2bd/0x670
[  272.868993] [   T1822]        elevator_change+0x2e7/0x500
[  272.869805] [   T1822]        elevator_set_none+0xaa/0xf0
[  272.870623] [   T1822]        blk_unregister_queue+0x15e/0x2e0
[  272.871459] [   T1822]        __del_gendisk+0x28f/0xaa0
[  272.872280] [   T1822]        del_gendisk+0x11a/0x1c0
[  272.873074] [   T1822]        nvme_ns_remove+0x331/0x940 [nvme_core]
[  272.873989] [   T1822]        nvme_remove_namespaces+0x289/0x3f0 [nvme_core]
[  272.874941] [   T1822]        nvme_do_delete_ctrl+0xf6/0x160 [nvme_core]
[  272.875811] [   T1822]        nvme_delete_ctrl_sync.cold+0x8/0xd [nvme_core]
[  272.876761] [   T1822]        nvme_sysfs_delete+0xba/0xe0 [nvme_core]
[  272.877636] [   T1822]        kernfs_fop_write_iter+0x3d6/0x5e0
[  272.878441] [   T1822]        vfs_write+0x4b3/0xf70
[  272.879157] [   T1822]        ksys_write+0x112/0x250
[  272.879908] [   T1822]        do_syscall_64+0xdf/0x790
[  272.880698] [   T1822]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  272.881499] [   T1822] 
                          -> #2 (&q->elevator_lock){+.+.}-{4:4}:
[  272.882720] [   T1822]        __mutex_lock+0x1ae/0x2650
[  272.883397] [   T1822]        elevator_change+0x197/0x500
[  272.884139] [   T1822]        elv_iosched_store+0x38f/0x430
[  272.884926] [   T1822]        queue_attr_store+0x25f/0x3e0
[  272.885685] [   T1822]        kernfs_fop_write_iter+0x3d6/0x5e0
[  272.886433] [   T1822]        vfs_write+0x4b3/0xf70
[  272.887112] [   T1822]        ksys_write+0x112/0x250
[  272.887838] [   T1822]        do_syscall_64+0xdf/0x790
[  272.888534] [   T1822]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  272.889303] [   T1822] 
                          -> #1 (&q->q_usage_counter(io)){++++}-{0:0}:
[  272.890567] [   T1822]        blk_alloc_queue+0x605/0x7a0
[  272.891320] [   T1822]        blk_mq_alloc_queue+0x168/0x270
[  272.892058] [   T1822]        scsi_alloc_sdev+0x86c/0xcd0
[  272.892808] [   T1822]        scsi_probe_and_add_lun+0x601/0xc50
[  272.893568] [   T1822]        __scsi_add_device+0x233/0x280
[  272.894309] [   T1822]        ata_scsi_scan_host+0x137/0x3a0
[  272.895032] [   T1822]        async_run_entry_fn+0x93/0x550
[  272.895766] [   T1822]        process_one_work+0x8b6/0x1650
[  272.896503] [   T1822]        worker_thread+0x5fd/0xfe0
[  272.897199] [   T1822]        kthread+0x36a/0x460
[  272.897824] [   T1822]        ret_from_fork+0x655/0x9d0
[  272.898528] [   T1822]        ret_from_fork_asm+0x1a/0x30
[  272.899251] [   T1822] 
                          -> #0 (fs_reclaim){+.+.}-{0:0}:
[  272.900443] [   T1822]        __lock_acquire+0xe06/0x22e0
[  272.901136] [   T1822]        lock_acquire+0x1a5/0x330
[  272.901839] [   T1822]        fs_reclaim_acquire+0xd5/0x120
[  272.902588] [   T1822]        __kmalloc_cache_noprof+0x58/0x720
[  272.903333] [   T1822]        __request_module+0x253/0x610
[  272.904050] [   T1822]        tcp_set_ulp+0x395/0x5e0
[  272.904769] [   T1822]        do_tcp_setsockopt+0x4a9/0x26a0
[  272.905492] [   T1822]        do_sock_setsockopt+0x163/0x3b0
[  272.906235] [   T1822]        __sys_setsockopt+0xe0/0x150
[  272.906965] [   T1822]        __x64_sys_setsockopt+0xb9/0x180
[  272.907734] [   T1822]        do_syscall_64+0xdf/0x790
[  272.908368] [   T1822]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  272.909049] [   T1822] 
                          other info that might help us debug this:

[  272.910750] [   T1822] Chain exists of:
                            fs_reclaim --> set->srcu --> sk_lock-AF_INET-NVME

[  272.912667] [   T1822]  Possible unsafe locking scenario:

[  272.913915] [   T1822]        CPU0                    CPU1
[  272.914569] [   T1822]        ----                    ----
[  272.915244] [   T1822]   lock(sk_lock-AF_INET-NVME);
[  272.915921] [   T1822]                                lock(set->srcu);
[  272.916742] [   T1822]                                lock(sk_lock-AF_INET-NVME);
[  272.917634] [   T1822]   lock(fs_reclaim);
[  272.918266] [   T1822] 
                           *** DEADLOCK ***

[  272.919861] [   T1822] 1 lock held by tlshd/1822:
[  272.920527] [   T1822]  #0: ffff8881220ad058 (sk_lock-AF_INET-NVME){+.+.}-{0:0}, at: do_tcp_setsockopt+0x499/0x26a0
[  272.921589] [   T1822] 
                          stack backtrace:
[  272.922693] [   T1822] CPU: 2 UID: 0 PID: 1822 Comm: tlshd Not tainted 7.1.0 #5 PREEMPT(full) 
[  272.922696] [   T1822] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-10.fc44 06/10/2025
[  272.922698] [   T1822] Call Trace:
[  272.922701] [   T1822]  <TASK>
[  272.922703] [   T1822]  dump_stack_lvl+0x6a/0x90
[  272.922708] [   T1822]  print_circular_bug.cold+0x189/0x1eb
[  272.922713] [   T1822]  check_noncircular+0x173/0x1a0
[  272.922717] [   T1822]  __lock_acquire+0xe06/0x22e0
[  272.922720] [   T1822]  lock_acquire+0x1a5/0x330
[  272.922722] [   T1822]  ? __kmalloc_cache_noprof+0x58/0x720
[  272.922724] [   T1822]  ? do_raw_spin_lock+0x12d/0x280
[  272.922727] [   T1822]  ? __request_module+0x253/0x610
[  272.922728] [   T1822]  fs_reclaim_acquire+0xd5/0x120
[  272.922731] [   T1822]  ? __kmalloc_cache_noprof+0x58/0x720
[  272.922733] [   T1822]  __kmalloc_cache_noprof+0x58/0x720
[  272.922735] [   T1822]  ? lockdep_hardirqs_on+0x8c/0x130
[  272.922738] [   T1822]  ? rcu_is_watching+0x11/0xb0
[  272.922742] [   T1822]  ? tcp_set_ulp+0x395/0x5e0
[  272.922744] [   T1822]  __request_module+0x253/0x610
[  272.922747] [   T1822]  ? __pfx___request_module+0x10/0x10
[  272.922750] [   T1822]  ? lock_acquire+0x1a5/0x330
[  272.922751] [   T1822]  ? rcu_is_watching+0x11/0xb0
[  272.922753] [   T1822]  ? cap_capable+0x1b7/0x3b0
[  272.922756] [   T1822]  ? lock_acquire+0x1b5/0x330
[  272.922757] [   T1822]  ? find_held_lock+0x2b/0x80
[  272.922761] [   T1822]  ? tcp_set_ulp+0x374/0x5e0
[  272.922763] [   T1822]  tcp_set_ulp+0x395/0x5e0
[  272.922766] [   T1822]  do_tcp_setsockopt+0x4a9/0x26a0
[  272.922769] [   T1822]  ? __pfx_do_tcp_setsockopt+0x10/0x10
[  272.922771] [   T1822]  ? __lock_acquire+0x3d2/0x22e0
[  272.922772] [   T1822]  ? lock_acquire+0x1a5/0x330
[  272.922774] [   T1822]  ? folio_add_file_rmap_ptes+0x7b6/0xa90
[  272.922779] [   T1822]  ? do_raw_spin_lock+0x12d/0x280
[  272.922780] [   T1822]  ? percpu_counter_add_batch+0x89/0x280
[  272.922785] [   T1822]  ? __pfx_selinux_netlbl_socket_setsockopt+0x10/0x10
[  272.922788] [   T1822]  ? find_held_lock+0x2b/0x80
[  272.922792] [   T1822]  do_sock_setsockopt+0x163/0x3b0
[  272.922795] [   T1822]  ? __pfx_do_sock_setsockopt+0x10/0x10
[  272.922798] [   T1822]  __sys_setsockopt+0xe0/0x150
[  272.922802] [   T1822]  __x64_sys_setsockopt+0xb9/0x180
[  272.922804] [   T1822]  ? rcu_read_unlock+0x17/0x60
[  272.922807] [   T1822]  ? lock_release+0x1b5/0x340
[  272.922809] [   T1822]  do_syscall_64+0xdf/0x790
[  272.922811] [   T1822]  ? rcu_read_unlock+0x1c/0x60
[  272.922813] [   T1822]  ? do_fault+0x8fc/0x13c0
[  272.922815] [   T1822]  ? rcu_read_unlock+0x17/0x60
[  272.922817] [   T1822]  ? lock_release+0x1b5/0x340
[  272.922819] [   T1822]  ? __handle_mm_fault+0x10ef/0x1d60
[  272.922822] [   T1822]  ? __lock_acquire+0x3d2/0x22e0
[  272.922824] [   T1822]  ? __pfx___css_rstat_updated+0x10/0x10
[  272.922829] [   T1822]  ? lock_acquire+0x1a5/0x330
[  272.922831] [   T1822]  ? count_memcg_events_mm.constprop.0+0x22/0x130
[  272.922833] [   T1822]  ? rcu_is_watching+0x11/0xb0
[  272.922835] [   T1822]  ? count_memcg_events+0x107/0x4e0
[  272.922839] [   T1822]  ? find_held_lock+0x2b/0x80
[  272.922841] [   T1822]  ? rcu_read_unlock+0x17/0x60
[  272.922843] [   T1822]  ? lock_release+0x1b5/0x340
[  272.922845] [   T1822]  ? find_held_lock+0x2b/0x80
[  272.922847] [   T1822]  ? exc_page_fault+0x94/0x140
[  272.922849] [   T1822]  ? lock_release+0x1b5/0x340
[  272.922851] [   T1822]  ? rcu_is_watching+0x11/0xb0
[  272.922857] [   T1822]  ? trace_hardirqs_on+0x14/0x1b0
[  272.922860] [   T1822]  ? preempt_count_add+0x7f/0x190
[  272.922864] [   T1822]  ? do_syscall_64+0x5d/0x790
[  272.922865] [   T1822]  ? do_syscall_64+0x8d/0x790
[  272.922867] [   T1822]  ? irqentry_exit+0xfc/0x790
[  272.922869] [   T1822]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  272.922871] [   T1822] RIP: 0033:0x7fa53a2da75e
[  272.922875] [   T1822] Code: 55 48 63 c9 48 63 ff 45 89 c9 48 89 e5 48 83 ec 08 6a 2c e8 54 69 f7 ff c9 c3 66 90 f3 0f 1e fa 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 61
[  272.922877] [   T1822] RSP: 002b:00007ffc6461e8e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000036
[  272.922880] [   T1822] RAX: ffffffffffffffda RBX: 000055d6e364a750 RCX: 00007fa53a2da75e
[  272.922881] [   T1822] RDX: 000000000000001f RSI: 0000000000000006 RDI: 0000000000000005
[  272.922882] [   T1822] RBP: 00007ffc6461e930 R08: 0000000000000004 R09: 0000000000000070
[  272.922883] [   T1822] R10: 000055d6b7b7be2a R11: 0000000000000206 R12: 000055d6e3657c30
[  272.922884] [   T1822] R13: 00007ffc6461e904 R14: 00007ffc6461e990 R15: 000055d6e364a7c0
[  272.922889] [   T1822]  </TASK>
[  273.088107] [    T296] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349, TLS.
[  273.100521] [   T1820] nvme nvme5: creating 4 I/O queues.
[  273.143469] [   T1820] nvme nvme5: mapped 4/0/0 default/read/poll queues.
[  273.147971] [   T1820] nvme nvme5: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  273.463094] [   T1890] nvme nvme5: Removing ctrl: NQN "blktests-subsystem-1"
[  273.581465] [   T1903] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  273.611642] [   T1909] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  273.697895] [   T1916] nvme_tcp: queue 0: failed to receive icresp, error -104
[  273.800385] [    T358] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349, TLS.
[  273.810247] [   T1926] nvme nvme5: creating 4 I/O queues.
[  273.844828] [   T1926] nvme nvme5: mapped 4/0/0 default/read/poll queues.
[  273.850998] [   T1926] nvme nvme5: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  274.163114] [   T1985] nvme nvme5: Removing ctrl: NQN "blktests-subsystem-1"


[8] nvme/062 dmesg on v7.1 kernel + nvme-tcp lockdep fix patch

[  327.536320] [   T1023] run blktests nvme/062 at 2026-06-24 12:43:52
[  327.852000] [   T1119] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  327.896673] [   T1122] nvmet: Allow non-TLS connections while TLS1.3 is enabled
[  327.913286] [   T1125] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  328.118861] [    T350] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  328.129891] [   T1132] nvme nvme5: creating 4 I/O queues.
[  328.141710] [   T1132] nvme nvme5: mapped 4/0/0 default/read/poll queues.
[  328.149625] [   T1132] nvme nvme5: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  329.072473] [   T1185] nvme nvme5: Removing ctrl: NQN "blktests-subsystem-1"
[  329.463408] [    T167] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349, TLS.
[  329.481208] [   T1192] nvme nvme5: creating 4 I/O queues.
[  329.562530] [   T1192] nvme nvme5: mapped 4/0/0 default/read/poll queues.
[  329.580181] [   T1192] nvme nvme5: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  330.583680] [   T1261] nvme nvme5: Removing ctrl: NQN "blktests-subsystem-1"
[  330.852275] [   T1274] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  330.914316] [   T1280] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  331.109290] [   T1287] nvme_tcp: queue 0: failed to receive icresp, error -104

[  331.117766] [   T1287] ======================================================
[  331.118639] [   T1287] WARNING: possible circular locking dependency detected
[  331.119510] [   T1287] 7.1.0+ #407 Not tainted
[  331.120120] [   T1287] ------------------------------------------------------
[  331.120993] [   T1287] nvme/1287 is trying to acquire lock:
[  331.121627] [   T1287] ffff888100923180 (&root->kernfs_rwsem){++++}-{4:4}, at: kernfs_remove_by_name_ns+0x53/0x160
[  331.122934] [   T1287] 
                          but task is already holding lock:
[  331.123841] [   T1287] ffff8881009232a0 (&root->kernfs_supers_rwsem){++++}-{4:4}, at: kernfs_remove_by_name_ns+0x4b/0x160
[  331.125143] [   T1287] 
                          which lock already depends on the new lock.

[  331.126388] [   T1287] 
                          the existing dependency chain (in reverse order) is:
[  331.127475] [   T1287] 
                          -> #9 (&root->kernfs_supers_rwsem){++++}-{4:4}:
[  331.128528] [   T1287]        down_read+0xa6/0x4c0
[  331.129147] [   T1287]        kernfs_remove_by_name_ns+0x4b/0x160
[  331.129900] [   T1287]        remove_files+0x8d/0x1b0
[  331.130474] [   T1287]        sysfs_remove_group+0x78/0x170
[  331.131164] [   T1287]        sysfs_remove_groups+0x63/0xd0
[  331.132307] [   T1287]        __kobject_del+0x7d/0x1e0
[  331.133391] [   T1287]        kobject_del+0x34/0x60
[  331.134437] [   T1287]        free_desc+0x184/0x1a0
[  331.135485] [   T1287]        irq_free_descs+0x4d/0x70
[  331.136554] [   T1287]        msi_domain_free_locked.part.0+0x492/0x690
[  331.137838] [   T1287]        msi_domain_free_irqs_all_locked+0xe9/0x140
[  331.139056] [   T1287]        pci_free_msi_irqs+0x12/0x90
[  331.140124] [   T1287]        pci_disable_msix+0xab/0xf0
[  331.141173] [   T1287]        pci_free_irq_vectors+0x12/0xe0
[  331.142299] [   T1287]        nvme_setup_io_queues+0x5d6/0x16c0 [nvme]
[  331.143485] [   T1287]        nvme_probe.cold+0x30f/0x65a [nvme]
[  331.144607] [   T1287]        local_pci_probe+0xdf/0x190
[  331.145620] [   T1287]        pci_call_probe+0x160/0x6d0
[  331.146635] [   T1287]        pci_device_probe+0x179/0x2f0
[  331.147660] [   T1287]        really_probe+0x1ed/0x900
[  331.148641] [   T1287]        __driver_probe_device+0x1d2/0x420
[  331.149699] [   T1287]        driver_probe_device+0x4a/0x120
[  331.150719] [   T1287]        __driver_attach_async_helper+0x10b/0x280
[  331.151827] [   T1287]        async_run_entry_fn+0x93/0x550
[  331.152796] [   T1287]        process_one_work+0x8b2/0x1640
[  331.153764] [   T1287]        worker_thread+0x5fd/0xfe0
[  331.154708] [   T1287]        kthread+0x367/0x460
[  331.155581] [   T1287]        ret_from_fork+0x655/0x9d0
[  331.156502] [   T1287]        ret_from_fork_asm+0x1a/0x30
[  331.157434] [   T1287] 
                          -> #8 (sparse_irq_lock){+.+.}-{4:4}:
[  331.158932] [   T1287]        __mutex_lock+0x1ae/0x2640
[  331.159820] [   T1287]        cpuhp_bringup_ap+0x52/0x950
[  331.160725] [   T1287]        cpuhp_invoke_callback+0x2d1/0x12e0
[  331.161725] [   T1287]        __cpuhp_invoke_callback_range+0xb6/0x1e0
[  331.162771] [   T1287]        _cpu_up+0x2eb/0x6d0
[  331.163604] [   T1287]        cpu_up+0x111/0x190
[  331.164436] [   T1287]        cpuhp_bringup_mask+0xd3/0x110
[  331.165358] [   T1287]        smp_init+0x27/0xe0
[  331.166183] [   T1287]        kernel_init_freeable+0x442/0x710
[  331.167120] [   T1287]        kernel_init+0x18/0x150
[  331.167951] [   T1287]        ret_from_fork+0x655/0x9d0
[  331.168804] [   T1287]        ret_from_fork_asm+0x1a/0x30
[  331.169665] [   T1287] 
                          -> #7 (cpu_hotplug_lock){++++}-{0:0}:
[  331.171064] [   T1287]        cpus_read_lock+0x3c/0xe0
[  331.171902] [   T1287]        static_key_disable+0x12/0x30
[  331.172773] [   T1287]        inet_hash+0xf3/0xd00
[  331.173571] [   T1287]        inet_csk_listen_start+0x350/0x440
[  331.174508] [   T1287]        __inet_listen_sk+0x191/0x650
[  331.175390] [   T1287]        inet_listen+0x9a/0xe0
[  331.176203] [   T1287]        __sys_listen+0x85/0x100
[  331.177018] [   T1287]        __x64_sys_listen+0x4e/0x90
[  331.177860] [   T1287]        do_syscall_64+0xdf/0x790
[  331.178686] [   T1287]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  331.179686] [   T1287] 
                          -> #6 (sk_lock-AF_INET){+.+.}-{0:0}:
[  331.181060] [   T1287]        lock_sock_nested+0x32/0xf0
[  331.181906] [   T1287]        tls_sw_sendmsg+0x1b4/0x23b0 [tls]
[  331.182832] [   T1287]        sock_sendmsg+0x305/0x3c0
[  331.183661] [   T1287]        nvmet_tcp_try_recv_pdu+0x150d/0x1d10 [nvmet_tcp]
[  331.184742] [   T1287]        nvmet_tcp_io_work+0x122/0x2420 [nvmet_tcp]
[  331.185769] [   T1287]        process_one_work+0x8b2/0x1640
[  331.186665] [   T1287]        worker_thread+0x5fd/0xfe0
[  331.187515] [   T1287]        kthread+0x367/0x460
[  331.188312] [   T1287]        ret_from_fork+0x655/0x9d0
[  331.189171] [   T1287]        ret_from_fork_asm+0x1a/0x30
[  331.190030] [   T1287] 
                          -> #5 (&ctx->tx_lock){+.+.}-{4:4}:
[  331.191388] [   T1287]        __mutex_lock+0x1ae/0x2640
[  331.192245] [   T1287]        tls_sw_sendmsg+0x130/0x23b0 [tls]
[  331.193187] [   T1287]        sock_sendmsg+0x305/0x3c0
[  331.194015] [   T1287]        nvme_tcp_try_send_cmd_pdu+0x630/0xcf0 [nvme_tcp]
[  331.195097] [   T1287]        nvme_tcp_try_send+0x1ef/0xa60 [nvme_tcp]
[  331.196106] [   T1287]        nvme_tcp_queue_rq+0xfa3/0x19e0 [nvme_tcp]
[  331.197123] [   T1287]        blk_mq_dispatch_rq_list+0x3e0/0x2420
[  331.198088] [   T1287]        __blk_mq_sched_dispatch_requests+0x20a/0x15d0
[  331.199145] [   T1287]        blk_mq_sched_dispatch_requests+0xa7/0x140
[  331.200166] [   T1287]        blk_mq_run_work_fn+0x135/0x2e0
[  331.201075] [   T1287]        process_one_work+0x8b2/0x1640
[  331.201962] [   T1287]        worker_thread+0x5fd/0xfe0
[  331.202808] [   T1287]        kthread+0x367/0x460
[  331.203593] [   T1287]        ret_from_fork+0x655/0x9d0
[  331.204458] [   T1287]        ret_from_fork_asm+0x1a/0x30
[  331.205336] [   T1287] 
                          -> #4 (set->srcu){.+.+}-{0:0}:
[  331.206663] [   T1287]        __synchronize_srcu+0xe1/0x2f0
[  331.207558] [   T1287]        elevator_switch+0x2bd/0x670
[  331.208462] [   T1287]        elevator_change+0x2e7/0x500
[  331.209346] [   T1287]        elevator_set_none+0xaa/0xf0
[  331.210226] [   T1287]        blk_unregister_queue+0x15e/0x2e0
[  331.211157] [   T1287]        __del_gendisk+0x28b/0xaa0
[  331.212007] [   T1287]        del_gendisk+0x11a/0x1c0
[  331.212834] [   T1287]        nvme_ns_remove+0x331/0x940 [nvme_core]
[  331.213828] [   T1287]        nvme_remove_namespaces+0x289/0x3f0 [nvme_core]
[  331.214896] [   T1287]        nvme_do_delete_ctrl+0xf6/0x160 [nvme_core]
[  331.215931] [   T1287]        nvme_delete_ctrl_sync.cold+0x8/0xd [nvme_core]
[  331.217003] [   T1287]        nvme_sysfs_delete+0xb7/0xe0 [nvme_core]
[  331.218011] [   T1287]        kernfs_fop_write_iter+0x3d6/0x5e0
[  331.218942] [   T1287]        vfs_write+0x4b3/0xf70
[  331.219759] [   T1287]        ksys_write+0x112/0x250
[  331.220594] [   T1287]        do_syscall_64+0xdf/0x790
[  331.221456] [   T1287]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  331.222474] [   T1287] 
                          -> #3 (&q->elevator_lock){+.+.}-{4:4}:
[  331.223909] [   T1287]        __mutex_lock+0x1ae/0x2640
[  331.224764] [   T1287]        elevator_change+0x197/0x500
[  331.225653] [   T1287]        elv_iosched_store+0x38f/0x430
[  331.226557] [   T1287]        queue_attr_store+0x25f/0x3e0
[  331.227458] [   T1287]        kernfs_fop_write_iter+0x3d6/0x5e0
[  331.228406] [   T1287]        vfs_write+0x4b3/0xf70
[  331.229232] [   T1287]        ksys_write+0x112/0x250
[  331.230068] [   T1287]        do_syscall_64+0xdf/0x790
[  331.230905] [   T1287]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  331.231904] [   T1287] 
                          -> #2 (&q->q_usage_counter(io)){++++}-{0:0}:
[  331.233380] [   T1287]        blk_alloc_queue+0x605/0x7a0
[  331.234276] [   T1287]        blk_mq_alloc_queue+0x168/0x270
[  331.235196] [   T1287]        scsi_alloc_sdev+0x86c/0xcd0
[  331.236087] [   T1287]        scsi_probe_and_add_lun+0x601/0xc50
[  331.237034] [   T1287]        __scsi_add_device+0x233/0x280
[  331.237926] [   T1287]        ata_scsi_scan_host+0x137/0x3a0
[  331.238826] [   T1287]        async_run_entry_fn+0x93/0x550
[  331.239717] [   T1287]        process_one_work+0x8b2/0x1640
[  331.240618] [   T1287]        worker_thread+0x5fd/0xfe0
[  331.241485] [   T1287]        kthread+0x367/0x460
[  331.242291] [   T1287]        ret_from_fork+0x655/0x9d0
[  331.243165] [   T1287]        ret_from_fork_asm+0x1a/0x30
[  331.244029] [   T1287] 
                          -> #1 (fs_reclaim){+.+.}-{0:0}:
[  331.245366] [   T1287]        fs_reclaim_acquire+0xd5/0x120
[  331.246266] [   T1287]        kmem_cache_alloc_lru_noprof+0x52/0x6c0
[  331.247257] [   T1287]        alloc_inode+0x9d/0x1e0
[  331.248093] [   T1287]        iget_locked+0x19d/0x630
[  331.248923] [   T1287]        kernfs_get_inode+0x42/0x440
[  331.249791] [   T1287]        kernfs_get_tree+0x5d0/0xbd0
[  331.250662] [   T1287]        sysfs_get_tree+0x3f/0x140
[  331.251523] [   T1287]        vfs_get_tree+0x87/0x2f0
[  331.252367] [   T1287]        fc_mount+0x16/0x220
[  331.253172] [   T1287]        path_mount+0x854/0x1d10
[  331.253998] [   T1287]        __x64_sys_mount+0x208/0x270
[  331.254860] [   T1287]        do_syscall_64+0xdf/0x790
[  331.255696] [   T1287]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  331.256706] [   T1287] 
                          -> #0 (&root->kernfs_rwsem){++++}-{4:4}:
[  331.258135] [   T1287]        __lock_acquire+0xe06/0x22e0
[  331.259001] [   T1287]        lock_acquire+0x1a5/0x330
[  331.259836] [   T1287]        down_write+0x8c/0x1f0
[  331.260639] [   T1287]        kernfs_remove_by_name_ns+0x53/0x160
[  331.261596] [   T1287]        sysfs_unmerge_group+0xd5/0x160
[  331.262508] [   T1287]        dev_pm_qos_hide_latency_tolerance+0x1f/0x60
[  331.263555] [   T1287]        nvme_uninit_ctrl+0x8f/0x110 [nvme_core]
[  331.264573] [   T1287]        nvme_tcp_create_ctrl+0x887/0xc20 [nvme_tcp]
[  331.265625] [   T1287]        nvmf_dev_write+0x40b/0x830 [nvme_fabrics]
[  331.266651] [   T1287]        vfs_write+0x1cc/0xf70
[  331.267472] [   T1287]        ksys_write+0x112/0x250
[  331.268308] [   T1287]        do_syscall_64+0xdf/0x790
[  331.269164] [   T1287]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  331.270178] [   T1287] 
                          other info that might help us debug this:

[  331.272111] [   T1287] Chain exists of:
                            &root->kernfs_rwsem --> sparse_irq_lock --> &root->kernfs_supers_rwsem

[  331.274536] [   T1287]  Possible unsafe locking scenario:

[  331.275924] [   T1287]        CPU0                    CPU1
[  331.276806] [   T1287]        ----                    ----
[  331.277684] [   T1287]   rlock(&root->kernfs_supers_rwsem);
[  331.278585] [   T1287]                                lock(sparse_irq_lock);
[  331.279668] [   T1287]                                lock(&root->kernfs_supers_rwsem);
[  331.280853] [   T1287]   lock(&root->kernfs_rwsem);
[  331.281672] [   T1287] 
                           *** DEADLOCK ***

[  331.283370] [   T1287] 3 locks held by nvme/1287:
[  331.284208] [   T1287]  #0: ffffffffc19e8260 (nvmf_dev_mutex){+.+.}-{4:4}, at: nvmf_dev_write+0x82/0x830 [nvme_fabrics]
[  331.285743] [   T1287]  #1: ffffffff87d02ce0 (dev_pm_qos_sysfs_mtx){+.+.}-{4:4}, at: dev_pm_qos_hide_latency_tolerance+0x17/0x60
[  331.287381] [   T1287]  #2: ffff8881009232a0 (&root->kernfs_supers_rwsem){++++}-{4:4}, at: kernfs_remove_by_name_ns+0x4b/0x160
[  331.288984] [   T1287] 
                          stack backtrace:
[  331.290235] [   T1287] CPU: 3 UID: 0 PID: 1287 Comm: nvme Not tainted 7.1.0+ #407 PREEMPT(full) 
[  331.290239] [   T1287] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-10.fc44 06/10/2025
[  331.290243] [   T1287] Call Trace:
[  331.290245] [   T1287]  <TASK>
[  331.290246] [   T1287]  dump_stack_lvl+0x6a/0x90
[  331.290253] [   T1287]  print_circular_bug.cold+0x189/0x1eb
[  331.290257] [   T1287]  check_noncircular+0x173/0x1a0
[  331.290261] [   T1287]  __lock_acquire+0xe06/0x22e0
[  331.290265] [   T1287]  lock_acquire+0x1a5/0x330
[  331.290267] [   T1287]  ? kernfs_remove_by_name_ns+0x53/0x160
[  331.290270] [   T1287]  ? __pfx___might_resched+0x10/0x10
[  331.290274] [   T1287]  down_write+0x8c/0x1f0
[  331.290276] [   T1287]  ? kernfs_remove_by_name_ns+0x53/0x160
[  331.290279] [   T1287]  ? __pfx_down_write+0x10/0x10
[  331.290280] [   T1287]  ? kernfs_root+0xac/0x1b0
[  331.290282] [   T1287]  ? lock_release+0x1b5/0x340
[  331.290285] [   T1287]  kernfs_remove_by_name_ns+0x53/0x160
[  331.290288] [   T1287]  sysfs_unmerge_group+0xd5/0x160
[  331.290291] [   T1287]  dev_pm_qos_hide_latency_tolerance+0x1f/0x60
[  331.290294] [   T1287]  nvme_uninit_ctrl+0x8f/0x110 [nvme_core]
[  331.290312] [   T1287]  nvme_tcp_create_ctrl+0x887/0xc20 [nvme_tcp]
[  331.290317] [   T1287]  ? nvmf_dev_write+0x2ff/0x830 [nvme_fabrics]
[  331.290324] [   T1287]  nvmf_dev_write+0x40b/0x830 [nvme_fabrics]
[  331.290329] [   T1287]  vfs_write+0x1cc/0xf70
[  331.290333] [   T1287]  ? __pfx_vfs_write+0x10/0x10
[  331.290338] [   T1287]  ksys_write+0x112/0x250
[  331.290341] [   T1287]  ? __pfx_ksys_write+0x10/0x10
[  331.290343] [   T1287]  ? kasan_quarantine_put+0x12e/0x260
[  331.290346] [   T1287]  ? kasan_quarantine_put+0x12e/0x260
[  331.290348] [   T1287]  do_syscall_64+0xdf/0x790
[  331.290352] [   T1287]  ? do_sys_openat2+0xfd/0x170
[  331.290354] [   T1287]  ? __pfx_do_sys_openat2+0x10/0x10
[  331.290356] [   T1287]  ? lock_is_held_type+0xf6/0x1b0
[  331.290359] [   T1287]  ? rcu_is_watching+0x11/0xb0
[  331.290362] [   T1287]  ? trace_hardirqs_on+0x14/0x1b0
[  331.290364] [   T1287]  ? lockdep_hardirqs_on+0x8c/0x130
[  331.290366] [   T1287]  ? __call_rcu_common.constprop.0+0x4af/0x1190
[  331.290370] [   T1287]  ? __x64_sys_openat+0x10a/0x210
[  331.290372] [   T1287]  ? __pfx___call_rcu_common.constprop.0+0x10/0x10
[  331.290375] [   T1287]  ? __pfx___x64_sys_openat+0x10/0x10
[  331.290378] [   T1287]  ? rcu_is_watching+0x11/0xb0
[  331.290380] [   T1287]  ? do_syscall_64+0x1ec/0x790
[  331.290382] [   T1287]  ? trace_hardirqs_on_prepare+0x14c/0x1a0
[  331.290384] [   T1287]  ? lockdep_hardirqs_on+0x8c/0x130
[  331.290386] [   T1287]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  331.290388] [   T1287]  ? do_syscall_64+0x20a/0x790
[  331.290391] [   T1287]  ? fput_close_sync+0xda/0x1b0
[  331.290393] [   T1287]  ? __pfx_fput_close_sync+0x10/0x10
[  331.290395] [   T1287]  ? do_raw_spin_unlock+0x55/0x230
[  331.290398] [   T1287]  ? rcu_is_watching+0x11/0xb0
[  331.290401] [   T1287]  ? do_syscall_64+0x1ec/0x790
[  331.290403] [   T1287]  ? trace_hardirqs_on_prepare+0x14c/0x1a0
[  331.290405] [   T1287]  ? lockdep_hardirqs_on+0x8c/0x130
[  331.290407] [   T1287]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  331.290409] [   T1287]  ? do_syscall_64+0x20a/0x790
[  331.290410] [   T1287]  ? count_memcg_events_mm.constprop.0+0x22/0x130
[  331.290414] [   T1287]  ? rcu_is_watching+0x11/0xb0
[  331.290417] [   T1287]  ? count_memcg_events+0x107/0x4e0
[  331.290419] [   T1287]  ? find_held_lock+0x2b/0x80
[  331.290422] [   T1287]  ? rcu_read_unlock+0x17/0x60
[  331.290425] [   T1287]  ? lock_release+0x1b5/0x340
[  331.290427] [   T1287]  ? find_held_lock+0x2b/0x80
[  331.290430] [   T1287]  ? exc_page_fault+0x94/0x140
[  331.290432] [   T1287]  ? lock_release+0x1b5/0x340
[  331.290435] [   T1287]  ? rcu_is_watching+0x11/0xb0
[  331.290438] [   T1287]  ? trace_hardirqs_on+0x14/0x1b0
[  331.290439] [   T1287]  ? preempt_count_add+0x7f/0x190
[  331.290442] [   T1287]  ? do_syscall_64+0x5d/0x790
[  331.290444] [   T1287]  ? do_syscall_64+0x8d/0x790
[  331.290446] [   T1287]  ? irqentry_exit+0xfc/0x790
[  331.290449] [   T1287]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  331.290451] [   T1287] RIP: 0033:0x7f407046277e
[  331.290456] [   T1287] Code: 4d 89 d8 e8 d4 bc 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
[  331.290458] [   T1287] RSP: 002b:00007ffc375d7ae0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[  331.290463] [   T1287] RAX: ffffffffffffffda RBX: 000000003c2f7680 RCX: 00007f407046277e
[  331.290464] [   T1287] RDX: 00000000000000cf RSI: 000000003c2f7680 RDI: 0000000000000003
[  331.290466] [   T1287] RBP: 00007ffc375d7af0 R08: 0000000000000000 R09: 0000000000000000
[  331.290467] [   T1287] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003
[  331.290468] [   T1287] R13: 000000003c2f5540 R14: 00007f4070638818 R15: 0000000000000001
[  331.290472] [   T1287]  </TASK>
[  331.336870] [      C0] clocksource: Watchdog remote CPU 3 read timed out
[  331.494199] [    T373] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349, TLS.
[  331.504462] [   T1298] nvme nvme5: creating 4 I/O queues.
[  331.546425] [   T1298] nvme nvme5: mapped 4/0/0 default/read/poll queues.
[  331.552151] [   T1298] nvme nvme5: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  332.171342] [   T1363] nvme nvme5: Removing ctrl: NQN "blktests-subsystem-1"


[9] nvme/063 dmesg on v7.1 kernel

[  353.490641] [   T1285] run blktests nvme/063 at 2026-06-24 10:54:40
[  353.790326] [   T1754] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  353.823845] [   T1757] nvmet: Allow non-TLS connections while TLS1.3 is enabled
[  353.835314] [   T1760] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  354.034947] [    T458] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
[  354.053192] [    T358] nvme nvme5: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[  354.055130] [   T1770] nvme nvme5: qid 0: authenticated
[  354.163171] [     T11] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349, TLS.
[  354.173339] [   T1770] nvme nvme5: creating 4 I/O queues.
[  354.234059] [   T1770] nvme nvme5: mapped 4/0/0 default/read/poll queues.
[  354.244089] [   T1770] nvme nvme5: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  354.714629] [   T1846] nvme nvme5: resetting controller
[  354.730193] [    T295] nvmet: Created nvm controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
[  354.738053] [     T49] nvme nvme5: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[  354.739399] [    T361] nvme nvme5: qid 0: authenticated
[  354.770783] [    T295] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349, TLS.
[  354.776061] [    T361] nvme nvme5: creating 4 I/O queues.
[  354.848673] [    T361] nvme nvme5: mapped 4/0/0 default/read/poll queues.
[  354.933728] [   T1861] nvme nvme5: Removing ctrl: NQN "blktests-subsystem-1"

[  354.954040] [   T1861] ======================================================
[  354.956228] [   T1861] WARNING: possible circular locking dependency detected
[  354.958462] [   T1861] 7.1.0 #5 Not tainted
[  354.959747] [   T1861] ------------------------------------------------------
[  354.961969] [   T1861] nvme/1861 is trying to acquire lock:
[  354.963758] [   T1861] ffff88812a332518 (set->srcu){.+.+}-{0:0}, at: __synchronize_srcu+0xc1/0x2f0
[  354.966758] [   T1861] 
                          but task is already holding lock:
[  354.969092] [   T1861] ffff88810d596bf8 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0x197/0x500
[  354.971880] [   T1861] 
                          which lock already depends on the new lock.

[  354.976664] [   T1861] 
                          the existing dependency chain (in reverse order) is:
[  354.980056] [   T1861] 
                          -> #5 (&q->elevator_lock){+.+.}-{4:4}:
[  354.982450] [   T1861]        __mutex_lock+0x1ae/0x2650
[  354.983578] [   T1861]        elevator_change+0x197/0x500
[  354.984623] [   T1861]        elv_iosched_store+0x38f/0x430
[  354.985737] [   T1861]        queue_attr_store+0x25f/0x3e0
[  354.986901] [   T1861]        kernfs_fop_write_iter+0x3d6/0x5e0
[  354.988242] [   T1861]        vfs_write+0x4b3/0xf70
[  354.989283] [   T1861]        ksys_write+0x112/0x250
[  354.990311] [   T1861]        do_syscall_64+0xdf/0x790
[  354.991481] [   T1861]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  354.992718] [   T1861] 
                          -> #4 (&q->q_usage_counter(io)){++++}-{0:0}:
[  354.994760] [   T1861]        blk_alloc_queue+0x605/0x7a0
[  354.995936] [   T1861]        blk_mq_alloc_queue+0x168/0x270
[  354.997058] [   T1861]        scsi_alloc_sdev+0x86c/0xcd0
[  354.998151] [   T1861]        scsi_probe_and_add_lun+0x601/0xc50
[  354.999296] [   T1861]        __scsi_add_device+0x233/0x280
[  355.000391] [   T1861]        ata_scsi_scan_host+0x137/0x3a0
[  355.001434] [   T1861]        async_run_entry_fn+0x93/0x550
[  355.002475] [   T1861]        process_one_work+0x8b6/0x1650
[  355.003571] [   T1861]        worker_thread+0x5fd/0xfe0
[  355.004559] [   T1861]        kthread+0x36a/0x460
[  355.005471] [   T1861]        ret_from_fork+0x655/0x9d0
[  355.006445] [   T1861]        ret_from_fork_asm+0x1a/0x30
[  355.007454] [   T1861] 
                          -> #3 (fs_reclaim){+.+.}-{0:0}:
[  355.009116] [   T1861]        fs_reclaim_acquire+0xd5/0x120
[  355.010151] [   T1861]        __kmalloc_cache_noprof+0x58/0x720
[  355.011270] [   T1861]        __request_module+0x253/0x610
[  355.012285] [   T1861]        tcp_set_ulp+0x395/0x5e0
[  355.013306] [   T1861]        do_tcp_setsockopt+0x4a9/0x26a0
[  355.014296] [   T1861]        do_sock_setsockopt+0x163/0x3b0
[  355.015198] [   T1861]        __sys_setsockopt+0xe0/0x150
[  355.016140] [   T1861]        __x64_sys_setsockopt+0xb9/0x180
[  355.017098] [   T1861]        do_syscall_64+0xdf/0x790
[  355.018038] [   T1861]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  355.019093] [   T1861] 
                          -> #2 (sk_lock-AF_INET-NVME){+.+.}-{0:0}:
[  355.020508] [   T1861]        lock_sock_nested+0x32/0xf0
[  355.021427] [   T1861]        tls_sw_sendmsg+0x1b4/0x23b0 [tls]
[  355.022442] [   T1861]        sock_sendmsg+0x305/0x3c0
[  355.023332] [   T1861]        nvme_tcp_init_connection+0x3d8/0x970 [nvme_tcp]
[  355.024658] [   T1861]        nvme_tcp_alloc_queue+0xf92/0x1ba0 [nvme_tcp]
[  355.025873] [   T1861]        nvme_tcp_alloc_admin_queue+0xff/0x440 [nvme_tcp]
[  355.027046] [   T1861]        nvme_tcp_setup_ctrl+0x188/0x8a0 [nvme_tcp]
[  355.028079] [   T1861]        nvme_tcp_create_ctrl+0x874/0xc20 [nvme_tcp]
[  355.029186] [   T1861]        nvmf_dev_write+0x40b/0x830 [nvme_fabrics]
[  355.030272] [   T1861]        vfs_write+0x1cc/0xf70
[  355.031137] [   T1861]        ksys_write+0x112/0x250
[  355.032031] [   T1861]        do_syscall_64+0xdf/0x790
[  355.032919] [   T1861]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  355.033977] [   T1861] 
                          -> #1 (&ctx->tx_lock){+.+.}-{4:4}:
[  355.035415] [   T1861]        __mutex_lock+0x1ae/0x2650
[  355.036296] [   T1861]        tls_sw_sendmsg+0x130/0x23b0 [tls]
[  355.037287] [   T1861]        sock_sendmsg+0x305/0x3c0
[  355.038171] [   T1861]        nvme_tcp_try_send_cmd_pdu+0x630/0xcf0 [nvme_tcp]
[  355.039293] [   T1861]        nvme_tcp_try_send+0x1ef/0xa60 [nvme_tcp]
[  355.040362] [   T1861]        nvme_tcp_queue_rq+0xfa3/0x19e0 [nvme_tcp]
[  355.041470] [   T1861]        blk_mq_dispatch_rq_list+0x3e0/0x2420
[  355.042456] [   T1861]        __blk_mq_sched_dispatch_requests+0x20a/0x15d0
[  355.043618] [   T1861]        blk_mq_sched_dispatch_requests+0xab/0x150
[  355.044725] [   T1861]        blk_mq_run_work_fn+0x135/0x2e0
[  355.045739] [   T1861]        process_one_work+0x8b6/0x1650
[  355.046753] [   T1861]        worker_thread+0x5fd/0xfe0
[  355.047663] [   T1861]        kthread+0x36a/0x460
[  355.048507] [   T1861]        ret_from_fork+0x655/0x9d0
[  355.049433] [   T1861]        ret_from_fork_asm+0x1a/0x30
[  355.050354] [   T1861] 
                          -> #0 (set->srcu){.+.+}-{0:0}:
[  355.051837] [   T1861]        __lock_acquire+0xe06/0x22e0
[  355.052584] [   T1861]        lock_sync+0xbf/0x120
[  355.053245] [   T1861]        __synchronize_srcu+0xe1/0x2f0
[  355.053967] [   T1861]        elevator_switch+0x2bd/0x670
[  355.054671] [   T1861]        elevator_change+0x2e7/0x500
[  355.055360] [   T1861]        elevator_set_none+0xaa/0xf0
[  355.056129] [   T1861]        blk_unregister_queue+0x15e/0x2e0
[  355.056911] [   T1861]        __del_gendisk+0x28f/0xaa0
[  355.057530] [   T1861]        del_gendisk+0x11a/0x1c0
[  355.058166] [   T1861]        nvme_ns_remove+0x331/0x940 [nvme_core]
[  355.058897] [   T1861]        nvme_remove_namespaces+0x289/0x3f0 [nvme_core]
[  355.059623] [   T1861]        nvme_do_delete_ctrl+0xf6/0x160 [nvme_core]
[  355.060683] [   T1861]        nvme_delete_ctrl_sync.cold+0x8/0xd [nvme_core]
[  355.061487] [   T1861]        nvme_sysfs_delete+0xba/0xe0 [nvme_core]
[  355.062274] [   T1861]        kernfs_fop_write_iter+0x3d6/0x5e0
[  355.063033] [   T1861]        vfs_write+0x4b3/0xf70
[  355.063694] [   T1861]        ksys_write+0x112/0x250
[  355.064372] [   T1861]        do_syscall_64+0xdf/0x790
[  355.065092] [   T1861]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  355.065912] [   T1861] 
                          other info that might help us debug this:

[  355.067898] [   T1861] Chain exists of:
                            set->srcu --> &q->q_usage_counter(io) --> &q->elevator_lock

[  355.070120] [   T1861]  Possible unsafe locking scenario:

[  355.071461] [   T1861]        CPU0                    CPU1
[  355.072232] [   T1861]        ----                    ----
[  355.073014] [   T1861]   lock(&q->elevator_lock);
[  355.073692] [   T1861]                                lock(&q->q_usage_counter(io));
[  355.074592] [   T1861]                                lock(&q->elevator_lock);
[  355.075498] [   T1861]   sync(set->srcu);
[  355.076133] [   T1861] 
                           *** DEADLOCK ***

[  355.077737] [   T1861] 5 locks held by nvme/1861:
[  355.078437] [   T1861]  #0: ffff88812a9a6410 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x112/0x250
[  355.079384] [   T1861]  #1: ffff888135ef9480 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x257/0x5e0
[  355.080388] [   T1861]  #2: ffff88810a849008 (kn->active#144){++++}-{0:0}, at: sysfs_remove_file_self+0x61/0xb0
[  355.081392] [   T1861]  #3: ffff88810641c1c8 (&set->update_nr_hwq_lock){++++}-{4:4}, at: del_gendisk+0x112/0x1c0
[  355.082436] [   T1861]  #4: ffff88810d596bf8 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0x197/0x500
[  355.083442] [   T1861] 
                          stack backtrace:
[  355.084570] [   T1861] CPU: 3 UID: 0 PID: 1861 Comm: nvme Not tainted 7.1.0 #5 PREEMPT(full) 
[  355.084573] [   T1861] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-10.fc44 06/10/2025
[  355.084576] [   T1861] Call Trace:
[  355.084578] [   T1861]  <TASK>
[  355.084580] [   T1861]  dump_stack_lvl+0x6a/0x90
[  355.084586] [   T1861]  print_circular_bug.cold+0x189/0x1eb
[  355.084590] [   T1861]  check_noncircular+0x173/0x1a0
[  355.084595] [   T1861]  __lock_acquire+0xe06/0x22e0
[  355.084598] [   T1861]  lock_sync+0xbf/0x120
[  355.084600] [   T1861]  ? __synchronize_srcu+0xc1/0x2f0
[  355.084602] [   T1861]  ? __synchronize_srcu+0xc1/0x2f0
[  355.084604] [   T1861]  __synchronize_srcu+0xe1/0x2f0
[  355.084606] [   T1861]  ? __pfx___synchronize_srcu+0x10/0x10
[  355.084608] [   T1861]  ? lock_acquire+0x1a5/0x330
[  355.084611] [   T1861]  ? _raw_spin_unlock_irqrestore+0x35/0x60
[  355.084614] [   T1861]  ? synchronize_srcu+0xae/0x3f0
[  355.084616] [   T1861]  elevator_switch+0x2bd/0x670
[  355.084619] [   T1861]  elevator_change+0x2e7/0x500
[  355.084622] [   T1861]  elevator_set_none+0xaa/0xf0
[  355.084624] [   T1861]  ? __pfx_elevator_set_none+0x10/0x10
[  355.084626] [   T1861]  ? kobject_put+0x62/0x530
[  355.084631] [   T1861]  blk_unregister_queue+0x15e/0x2e0
[  355.084633] [   T1861]  __del_gendisk+0x28f/0xaa0
[  355.084635] [   T1861]  ? down_read+0xbd/0x4d0
[  355.084637] [   T1861]  ? down_read+0x148/0x4d0
[  355.084638] [   T1861]  ? __pfx___del_gendisk+0x10/0x10
[  355.084641] [   T1861]  ? __pfx_down_read+0x10/0x10
[  355.084642] [   T1861]  ? up_write+0x201/0x540
[  355.084645] [   T1861]  ? up_write+0x2ad/0x540
[  355.084647] [   T1861]  del_gendisk+0x11a/0x1c0
[  355.084649] [   T1861]  nvme_ns_remove+0x331/0x940 [nvme_core]
[  355.084666] [   T1861]  nvme_remove_namespaces+0x289/0x3f0 [nvme_core]
[  355.084681] [   T1861]  ? __pfx_nvme_remove_namespaces+0x10/0x10 [nvme_core]
[  355.084693] [   T1861]  nvme_do_delete_ctrl+0xf6/0x160 [nvme_core]
[  355.084705] [   T1861]  nvme_delete_ctrl_sync.cold+0x8/0xd [nvme_core]
[  355.084717] [   T1861]  nvme_sysfs_delete+0xba/0xe0 [nvme_core]
[  355.084729] [   T1861]  ? __pfx_sysfs_kf_write+0x10/0x10
[  355.084732] [   T1861]  kernfs_fop_write_iter+0x3d6/0x5e0
[  355.084735] [   T1861]  ? __pfx_kernfs_fop_write_iter+0x10/0x10
[  355.084736] [   T1861]  vfs_write+0x4b3/0xf70
[  355.084739] [   T1861]  ? __pfx_vfs_write+0x10/0x10
[  355.084741] [   T1861]  ? rcu_is_watching+0x11/0xb0
[  355.084745] [   T1861]  ? kmem_cache_free+0x163/0x6b0
[  355.084749] [   T1861]  ksys_write+0x112/0x250
[  355.084751] [   T1861]  ? __pfx_ksys_write+0x10/0x10
[  355.084753] [   T1861]  ? __x64_sys_close+0x87/0xf0
[  355.084756] [   T1861]  do_syscall_64+0xdf/0x790
[  355.084758] [   T1861]  ? __x64_sys_openat+0x10a/0x210
[  355.084760] [   T1861]  ? __pfx___x64_sys_openat+0x10/0x10
[  355.084762] [   T1861]  ? rcu_is_watching+0x11/0xb0
[  355.084764] [   T1861]  ? do_syscall_64+0x1ec/0x790
[  355.084766] [   T1861]  ? trace_hardirqs_on_prepare+0x14c/0x1a0
[  355.084769] [   T1861]  ? lockdep_hardirqs_on+0x8c/0x130
[  355.084772] [   T1861]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  355.084773] [   T1861]  ? do_syscall_64+0x20a/0x790
[  355.084776] [   T1861]  ? lock_is_held_type+0xf6/0x1b0
[  355.084778] [   T1861]  ? rcu_is_watching+0x11/0xb0
[  355.084780] [   T1861]  ? trace_hardirqs_on+0x14/0x1b0
[  355.084781] [   T1861]  ? lockdep_hardirqs_on+0x8c/0x130
[  355.084783] [   T1861]  ? __call_rcu_common.constprop.0+0x4af/0x11a0
[  355.084785] [   T1861]  ? __call_rcu_common.constprop.0+0x4af/0x11a0
[  355.084787] [   T1861]  ? __pfx___call_rcu_common.constprop.0+0x10/0x10
[  355.084791] [   T1861]  ? fput_close_sync+0xda/0x1b0
[  355.084792] [   T1861]  ? kmem_cache_free+0x47e/0x6b0
[  355.084795] [   T1861]  ? fput_close_sync+0xda/0x1b0
[  355.084796] [   T1861]  ? __pfx_fput_close_sync+0x10/0x10
[  355.084798] [   T1861]  ? do_raw_spin_unlock+0x55/0x230
[  355.084800] [   T1861]  ? rcu_is_watching+0x11/0xb0
[  355.084802] [   T1861]  ? do_syscall_64+0x1ec/0x790
[  355.084803] [   T1861]  ? trace_hardirqs_on_prepare+0x14c/0x1a0
[  355.084805] [   T1861]  ? lockdep_hardirqs_on+0x8c/0x130
[  355.084807] [   T1861]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  355.084808] [   T1861]  ? do_syscall_64+0x20a/0x790
[  355.084809] [   T1861]  ? do_syscall_64+0x1ec/0x790
[  355.084811] [   T1861]  ? trace_hardirqs_on_prepare+0x14c/0x1a0
[  355.084812] [   T1861]  ? lockdep_hardirqs_on+0x8c/0x130
[  355.084814] [   T1861]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  355.084815] [   T1861]  ? do_syscall_64+0x20a/0x790
[  355.084817] [   T1861]  ? trace_hardirqs_on_prepare+0x14c/0x1a0
[  355.084818] [   T1861]  ? rcu_is_watching+0x11/0xb0
[  355.084820] [   T1861]  ? trace_hardirqs_on+0x14/0x1b0
[  355.084822] [   T1861]  ? preempt_count_add+0x7f/0x190
[  355.084825] [   T1861]  ? do_syscall_64+0x5d/0x790
[  355.084826] [   T1861]  ? do_syscall_64+0x8d/0x790
[  355.084828] [   T1861]  ? irqentry_exit+0xfc/0x790
[  355.084830] [   T1861]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  355.084832] [   T1861] RIP: 0033:0x7fe310bc408e
[  355.084836] [   T1861] Code: 4d 89 d8 e8 94 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 03 ff ff ff 0f 1f 00 f3 0f 1e fa
[  355.084838] [   T1861] RSP: 002b:00007ffe4bbbbdf0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[  355.084841] [   T1861] RAX: ffffffffffffffda RBX: 00007fe310d8a566 RCX: 00007fe310bc408e
[  355.084842] [   T1861] RDX: 0000000000000001 RSI: 00007fe310d8a566 RDI: 0000000000000003
[  355.084843] [   T1861] RBP: 00007ffe4bbbbe00 R08: 0000000000000000 R09: 0000000000000000
[  355.084844] [   T1861] R10: 0000000000000000 R11: 0000000000000202 R12: 000000001ddfd910
[  355.084845] [   T1861] R13: 00007ffe4bbbe5e6 R14: 000000001ddfd720 R15: 000000001ddffb50
[  355.084848] [   T1861]  </TASK>
[  355.363487] [    T295] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
[  355.381067] [     T49] nvme nvme5: qid 0: authenticated with hash hmac(sha384) dhgroup ffdhe3072
[  355.385933] [   T1873] nvme nvme5: qid 0: authenticated
[  355.413432] [     T11] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349, TLS.
[  355.431349] [   T1873] nvme nvme5: creating 4 I/O queues.
[  355.473722] [   T1873] nvme nvme5: mapped 4/0/0 default/read/poll queues.
[  355.479399] [   T1873] nvme nvme5: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  355.956596] [   T1931] nvme nvme5: Removing ctrl: NQN "blktests-subsystem-1"

^ permalink raw reply

* Re: [PATCH v17 06/10] rust: rename `AlwaysRefCounted` to `RefCounted`.
From: Onur Özkan @ 2026-06-23 17:58 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Greg Kroah-Hartman,
	Dave Ertman, Ira Weiny, Leon Romanovsky, Paul Moore, Serge Hallyn,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Alexander Viro,
	Christian Brauner, Jan Kara, Daniel Almeida, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Bjorn Helgaas,
	Krzysztof Wilczyński, Boqun Feng, Uladzislau Rezki,
	Lorenzo Stoakes, Vlastimil Babka, Liam R. Howlett, Igor Korotin,
	Pavel Tikhomirov, linux-kernel, rust-for-linux, linux-block,
	linux-security-module, dri-devel, linux-fsdevel, linux-mm,
	linux-pm, linux-pci, driver-core, Oliver Mangold, Viresh Kumar
In-Reply-To: <20260604-unique-ref-v17-6-7b4c3d2930b9@kernel.org>

On Thu, 04 Jun 2026 22:11:18 +0200
Andreas Hindborg <a.hindborg@kernel.org> wrote:

> From: Oliver Mangold <oliver.mangold@pm.me>
> 
> There are types where it may both be reference counted in some cases and
> owned in others. In such cases, obtaining `ARef<T>` from `&T` would be
> unsound as it allows creation of `ARef<T>` copy from `&Owned<T>`.
> 
> Therefore, we split `AlwaysRefCounted` into `RefCounted` (which `ARef<T>`
> would require) and a marker trait to indicate that the type is always
> reference counted (and not `Ownable`) so the `&T` -> `ARef<T>` conversion
> is possible.
> 
> - Rename `AlwaysRefCounted` to `RefCounted`.
> - Add a new unsafe trait `AlwaysRefCounted`.
> - Implement the new trait `AlwaysRefCounted` for the newly renamed
>   `RefCounted` implementations. This leaves functionality of existing
>   implementers of `AlwaysRefCounted` intact.
> 
> Suggested-by: Alice Ryhl <aliceryhl@google.com>
> Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
> Signed-off-by: Oliver Mangold <oliver.mangold@pm.me>
> [ Andreas: Updated commit message and rebase on rust-6.20-7.0 ]
> Acked-by: Igor Korotin <igor.korotin.linux@gmail.com>
> Acked-by: Danilo Krummrich <dakr@kernel.org>
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
> Reviewed-by: Gary Guo <gary@garyguo.net>
> Co-developed-by: Andreas Hindborg <a.hindborg@kernel.org>
> Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
> ---
>  rust/kernel/auxiliary.rs        |  7 +++++-
>  rust/kernel/block/mq/request.rs | 15 ++++++++-----
>  rust/kernel/cred.rs             | 13 +++++++++--
>  rust/kernel/device.rs           | 12 ++++++++--
>  rust/kernel/device/property.rs  | 11 +++++++--
>  rust/kernel/drm/device.rs       |  9 ++++++--
>  rust/kernel/drm/gem/mod.rs      | 16 ++++++++++----
>  rust/kernel/fs/file.rs          | 16 ++++++++++----
>  rust/kernel/i2c.rs              | 13 ++++++++---
>  rust/kernel/mm.rs               | 15 +++++++++----
>  rust/kernel/mm/mmput_async.rs   |  9 ++++++--
>  rust/kernel/opp.rs              | 10 ++++++---
>  rust/kernel/owned.rs            |  2 +-
>  rust/kernel/pci.rs              | 10 ++++++++-
>  rust/kernel/pid_namespace.rs    | 12 ++++++++--
>  rust/kernel/platform.rs         |  7 +++++-
>  rust/kernel/sync/aref.rs        | 49 ++++++++++++++++++++++++++---------------
>  rust/kernel/task.rs             | 13 +++++++++--
>  rust/kernel/types.rs            |  3 ++-
>  rust/kernel/usb.rs              | 17 +++++++++++---
>  20 files changed, 195 insertions(+), 64 deletions(-)
> 
> diff --git a/rust/kernel/auxiliary.rs b/rust/kernel/auxiliary.rs
> index 93c0db1f6655..49f07740f657 100644
> --- a/rust/kernel/auxiliary.rs
> +++ b/rust/kernel/auxiliary.rs
> @@ -19,6 +19,7 @@
>          to_result, //
>      },
>      prelude::*,
> +    sync::aref::{AlwaysRefCounted, RefCounted},

This patch has multiple horizontal use statements around.

Onur

>      types::Opaque,
>      ThisModule, //
>  };
> @@ -289,7 +290,7 @@ unsafe impl<Ctx: device::DeviceContext> device::AsBusDevice<Ctx> for Device<Ctx>
>  kernel::impl_device_context_into_aref!(Device);
>  
>  // SAFETY: Instances of `Device` are always reference-counted.
> -unsafe impl crate::sync::aref::AlwaysRefCounted for Device {
> +unsafe impl RefCounted for Device {
>      fn inc_ref(&self) {
>          // SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.
>          unsafe { bindings::get_device(self.as_ref().as_raw()) };
> @@ -308,6 +309,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from a
> +// `&Device`.
> +unsafe impl AlwaysRefCounted for Device {}
> +
>  impl<Ctx: device::DeviceContext> AsRef<device::Device<Ctx>> for Device<Ctx> {
>      fn as_ref(&self) -> &device::Device<Ctx> {
>          // SAFETY: By the type invariant of `Self`, `self.as_raw()` is a pointer to a valid
> diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request.rs
> index ce3e30c81cb5..cf013b9e2cac 100644
> --- a/rust/kernel/block/mq/request.rs
> +++ b/rust/kernel/block/mq/request.rs
> @@ -9,7 +9,7 @@
>      block::mq::Operations,
>      error::Result,
>      sync::{
> -        aref::{ARef, AlwaysRefCounted},
> +        aref::{ARef, AlwaysRefCounted, RefCounted},
>          atomic::Relaxed,
>          Refcount,
>      },
> @@ -229,11 +229,10 @@ unsafe impl<T: Operations> Send for Request<T> {}
>  // mutate `self` are internally synchronized`
>  unsafe impl<T: Operations> Sync for Request<T> {}
>  
> -// SAFETY: All instances of `Request<T>` are reference counted. This
> -// implementation of `AlwaysRefCounted` ensure that increments to the ref count
> -// keeps the object alive in memory at least until a matching reference count
> -// decrement is executed.
> -unsafe impl<T: Operations> AlwaysRefCounted for Request<T> {
> +// SAFETY: All instances of `Request<T>` are reference counted. This implementation of `RefCounted`
> +// ensure that increments to the ref count keeps the object alive in memory at least until a
> +// matching reference count decrement is executed.
> +unsafe impl<T: Operations> RefCounted for Request<T> {
>      fn inc_ref(&self) {
>          self.wrapper_ref().refcount().inc();
>      }
> @@ -255,3 +254,7 @@ unsafe fn dec_ref(obj: core::ptr::NonNull<Self>) {
>          }
>      }
>  }
> +
> +// SAFETY: We currently do not implement `Ownable`, thus it is okay to obtain an `ARef<Request>`
> +// from a `&Request` (but this will change in the future).
> +unsafe impl<T: Operations> AlwaysRefCounted for Request<T> {}
> diff --git a/rust/kernel/cred.rs b/rust/kernel/cred.rs
> index ffa156b9df37..20ef0144094b 100644
> --- a/rust/kernel/cred.rs
> +++ b/rust/kernel/cred.rs
> @@ -8,7 +8,12 @@
>  //!
>  //! Reference: <https://www.kernel.org/doc/html/latest/security/credentials.html>
>  
> -use crate::{bindings, sync::aref::AlwaysRefCounted, task::Kuid, types::Opaque};
> +use crate::{
> +    bindings,
> +    sync::aref::RefCounted,
> +    task::Kuid,
> +    types::{AlwaysRefCounted, Opaque},
> +};
>  
>  /// Wraps the kernel's `struct cred`.
>  ///
> @@ -76,7 +81,7 @@ pub fn euid(&self) -> Kuid {
>  }
>  
>  // SAFETY: The type invariants guarantee that `Credential` is always ref-counted.
> -unsafe impl AlwaysRefCounted for Credential {
> +unsafe impl RefCounted for Credential {
>      #[inline]
>      fn inc_ref(&self) {
>          // SAFETY: The existence of a shared reference means that the refcount is nonzero.
> @@ -90,3 +95,7 @@ unsafe fn dec_ref(obj: core::ptr::NonNull<Credential>) {
>          unsafe { bindings::put_cred(obj.cast().as_ptr()) };
>      }
>  }
> +
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Credential>` from a
> +// `&Credential`.
> +unsafe impl AlwaysRefCounted for Credential {}
> diff --git a/rust/kernel/device.rs b/rust/kernel/device.rs
> index 6d5396a43ebe..efdf33617d12 100644
> --- a/rust/kernel/device.rs
> +++ b/rust/kernel/device.rs
> @@ -8,8 +8,12 @@
>      bindings,
>      fmt,
>      prelude::*,
> -    sync::aref::ARef,
> +    sync::aref::{
> +        ARef,
> +        RefCounted, //
> +    },
>      types::{
> +        AlwaysRefCounted,
>          ForeignOwnable,
>          Opaque, //
>      }, //
> @@ -508,7 +512,7 @@ pub fn name(&self) -> &CStr {
>  kernel::impl_device_context_into_aref!(Device);
>  
>  // SAFETY: Instances of `Device` are always reference-counted.
> -unsafe impl crate::sync::aref::AlwaysRefCounted for Device {
> +unsafe impl RefCounted for Device {
>      fn inc_ref(&self) {
>          // SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.
>          unsafe { bindings::get_device(self.as_raw()) };
> @@ -520,6 +524,10 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from a
> +// `&Device`.
> +unsafe impl AlwaysRefCounted for Device {}
> +
>  // SAFETY: As by the type invariant `Device` can be sent to any thread.
>  unsafe impl Send for Device {}
>  
> diff --git a/rust/kernel/device/property.rs b/rust/kernel/device/property.rs
> index 5aead835fbbc..cee7e2501368 100644
> --- a/rust/kernel/device/property.rs
> +++ b/rust/kernel/device/property.rs
> @@ -14,7 +14,10 @@
>      fmt,
>      prelude::*,
>      str::{CStr, CString},
> -    sync::aref::ARef,
> +    sync::aref::{
> +        ARef,
> +        AlwaysRefCounted, //
> +    },
>      types::Opaque,
>  };
>  
> @@ -360,7 +363,7 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
>  }
>  
>  // SAFETY: Instances of `FwNode` are always reference-counted.
> -unsafe impl crate::sync::aref::AlwaysRefCounted for FwNode {
> +unsafe impl crate::sync::aref::RefCounted for FwNode {
>      fn inc_ref(&self) {
>          // SAFETY: The existence of a shared reference guarantees that the
>          // refcount is non-zero.
> @@ -374,6 +377,10 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<FwNode>` from a
> +// `&FwNode`.
> +unsafe impl AlwaysRefCounted for FwNode {}
> +
>  enum Node<'a> {
>      Borrowed(&'a FwNode),
>      Owned(ARef<FwNode>),
> diff --git a/rust/kernel/drm/device.rs b/rust/kernel/drm/device.rs
> index adbafe8db54d..a5a040266aae 100644
> --- a/rust/kernel/drm/device.rs
> +++ b/rust/kernel/drm/device.rs
> @@ -15,7 +15,8 @@
>      prelude::*,
>      sync::aref::{
>          ARef,
> -        AlwaysRefCounted, //
> +        AlwaysRefCounted,
> +        RefCounted, //
>      },
>      types::Opaque,
>      workqueue::{
> @@ -217,7 +218,7 @@ fn deref(&self) -> &Self::Target {
>  
>  // SAFETY: DRM device objects are always reference counted and the get/put functions
>  // satisfy the requirements.
> -unsafe impl<T: drm::Driver> AlwaysRefCounted for Device<T> {
> +unsafe impl<T: drm::Driver> RefCounted for Device<T> {
>      fn inc_ref(&self) {
>          // SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.
>          unsafe { bindings::drm_dev_get(self.as_raw()) };
> @@ -232,6 +233,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from a
> +// `&Device`.
> +unsafe impl<T: drm::Driver> AlwaysRefCounted for Device<T> {}
> +
>  impl<T: drm::Driver> AsRef<device::Device> for Device<T> {
>      fn as_ref(&self) -> &device::Device {
>          // SAFETY: `bindings::drm_device::dev` is valid as long as the DRM device itself is valid,
> diff --git a/rust/kernel/drm/gem/mod.rs b/rust/kernel/drm/gem/mod.rs
> index 75acda7ba500..f8cc2a0ff4c7 100644
> --- a/rust/kernel/drm/gem/mod.rs
> +++ b/rust/kernel/drm/gem/mod.rs
> @@ -17,7 +17,7 @@
>      prelude::*,
>      sync::aref::{
>          ARef,
> -        AlwaysRefCounted, //
> +        RefCounted, //
>      },
>      types::Opaque,
>  };
> @@ -29,7 +29,7 @@
>  #[cfg(CONFIG_RUST_DRM_GEM_SHMEM_HELPER)]
>  pub mod shmem;
>  
> -/// A macro for implementing [`AlwaysRefCounted`] for any GEM object type.
> +/// A macro for implementing [`RefCounted`] for any GEM object type.
>  ///
>  /// Since all GEM objects use the same refcounting scheme.
>  #[macro_export]
> @@ -42,7 +42,7 @@ impl $( <$( $tparam_id:ident ),+> )? for $type:ty
>          )?
>      ) => {
>          // SAFETY: All GEM objects are refcounted.
> -        unsafe impl $( <$( $tparam_id ),+> )? $crate::sync::aref::AlwaysRefCounted for $type
> +        unsafe impl $( <$( $tparam_id ),+> )? $crate::sync::aref::RefCounted for $type
>          where
>              Self: IntoGEMObject,
>              $( $( $bind_param : $bind_trait ),+ )?
> @@ -61,6 +61,14 @@ unsafe fn dec_ref(obj: core::ptr::NonNull<Self>) {
>                  unsafe { bindings::drm_gem_object_put(obj) };
>              }
>          }
> +
> +        // SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<$type>` from a
> +        // `&$type`.
> +        unsafe impl $( <$( $tparam_id ),+> )? $crate::sync::aref::AlwaysRefCounted for $type
> +        where
> +            Self: IntoGEMObject,
> +            $( $( $bind_param : $bind_trait ),+ )?
> +        {}
>      };
>  }
>  #[cfg_attr(not(CONFIG_RUST_DRM_GEM_SHMEM_HELPER), allow(unused))]
> @@ -98,7 +106,7 @@ fn close(_obj: &<Self::Driver as drm::Driver>::Object, _file: &DriverFile<Self>)
>  }
>  
>  /// Trait that represents a GEM object subtype
> -pub trait IntoGEMObject: Sized + super::private::Sealed + AlwaysRefCounted {
> +pub trait IntoGEMObject: Sized + super::private::Sealed + RefCounted {
>      /// Returns a reference to the raw `drm_gem_object` structure, which must be valid as long as
>      /// this owning object is valid.
>      fn as_raw(&self) -> *mut bindings::drm_gem_object;
> diff --git a/rust/kernel/fs/file.rs b/rust/kernel/fs/file.rs
> index 23ee689bd240..06e457d62a93 100644
> --- a/rust/kernel/fs/file.rs
> +++ b/rust/kernel/fs/file.rs
> @@ -12,8 +12,8 @@
>      cred::Credential,
>      error::{code::*, to_result, Error, Result},
>      fmt,
> -    sync::aref::{ARef, AlwaysRefCounted},
> -    types::{NotThreadSafe, Opaque},
> +    sync::aref::RefCounted,
> +    types::{ARef, AlwaysRefCounted, NotThreadSafe, Opaque},
>  };
>  use core::ptr;
>  
> @@ -197,7 +197,7 @@ unsafe impl Sync for File {}
>  
>  // SAFETY: The type invariants guarantee that `File` is always ref-counted. This implementation
>  // makes `ARef<File>` own a normal refcount.
> -unsafe impl AlwaysRefCounted for File {
> +unsafe impl RefCounted for File {
>      #[inline]
>      fn inc_ref(&self) {
>          // SAFETY: The existence of a shared reference means that the refcount is nonzero.
> @@ -212,6 +212,10 @@ unsafe fn dec_ref(obj: ptr::NonNull<File>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<File>` from a
> +// `&File`.
> +unsafe impl AlwaysRefCounted for File {}
> +
>  /// Wraps the kernel's `struct file`. Not thread safe.
>  ///
>  /// This type represents a file that is not known to be safe to transfer across thread boundaries.
> @@ -233,7 +237,7 @@ pub struct LocalFile {
>  
>  // SAFETY: The type invariants guarantee that `LocalFile` is always ref-counted. This implementation
>  // makes `ARef<LocalFile>` own a normal refcount.
> -unsafe impl AlwaysRefCounted for LocalFile {
> +unsafe impl RefCounted for LocalFile {
>      #[inline]
>      fn inc_ref(&self) {
>          // SAFETY: The existence of a shared reference means that the refcount is nonzero.
> @@ -249,6 +253,10 @@ unsafe fn dec_ref(obj: ptr::NonNull<LocalFile>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<LocalFile>` from a
> +// `&LocalFile`.
> +unsafe impl AlwaysRefCounted for LocalFile {}
> +
>  impl LocalFile {
>      /// Constructs a new `struct file` wrapper from a file descriptor.
>      ///
> diff --git a/rust/kernel/i2c.rs b/rust/kernel/i2c.rs
> index 7b908f0c5a58..56791c1d63d7 100644
> --- a/rust/kernel/i2c.rs
> +++ b/rust/kernel/i2c.rs
> @@ -18,7 +18,8 @@
>      prelude::*,
>      sync::aref::{
>          ARef,
> -        AlwaysRefCounted, //
> +        AlwaysRefCounted,
> +        RefCounted, //
>      },
>      types::Opaque, //
>  };
> @@ -415,7 +416,7 @@ pub fn get(index: i32) -> Result<ARef<Self>> {
>  kernel::impl_device_context_into_aref!(I2cAdapter);
>  
>  // SAFETY: Instances of `I2cAdapter` are always reference-counted.
> -unsafe impl AlwaysRefCounted for I2cAdapter {
> +unsafe impl RefCounted for I2cAdapter {
>      fn inc_ref(&self) {
>          // SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.
>          unsafe { bindings::i2c_get_adapter(self.index()) };
> @@ -426,6 +427,9 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
>          unsafe { bindings::i2c_put_adapter(obj.as_ref().as_raw()) }
>      }
>  }
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from an
> +// `&I2cAdapter`.
> +unsafe impl AlwaysRefCounted for I2cAdapter {}
>  
>  /// The i2c board info representation
>  ///
> @@ -491,7 +495,7 @@ unsafe impl<Ctx: device::DeviceContext> device::AsBusDevice<Ctx> for I2cClient<C
>  kernel::impl_device_context_into_aref!(I2cClient);
>  
>  // SAFETY: Instances of `I2cClient` are always reference-counted.
> -unsafe impl AlwaysRefCounted for I2cClient {
> +unsafe impl RefCounted for I2cClient {
>      fn inc_ref(&self) {
>          // SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.
>          unsafe { bindings::get_device(self.as_ref().as_raw()) };
> @@ -502,6 +506,9 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
>          unsafe { bindings::put_device(&raw mut (*obj.as_ref().as_raw()).dev) }
>      }
>  }
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from an
> +// `&I2cClient`.
> +unsafe impl AlwaysRefCounted for I2cClient {}
>  
>  impl<Ctx: device::DeviceContext> AsRef<device::Device<Ctx>> for I2cClient<Ctx> {
>      fn as_ref(&self) -> &device::Device<Ctx> {
> diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> index 4764d7b68f2a..dd9e3969e720 100644
> --- a/rust/kernel/mm.rs
> +++ b/rust/kernel/mm.rs
> @@ -13,8 +13,8 @@
>  
>  use crate::{
>      bindings,
> -    sync::aref::{ARef, AlwaysRefCounted},
> -    types::{NotThreadSafe, Opaque},
> +    sync::aref::RefCounted,
> +    types::{ARef, AlwaysRefCounted, NotThreadSafe, Opaque},
>  };
>  use core::{ops::Deref, ptr::NonNull};
>  
> @@ -55,7 +55,7 @@ unsafe impl Send for Mm {}
>  unsafe impl Sync for Mm {}
>  
>  // SAFETY: By the type invariants, this type is always refcounted.
> -unsafe impl AlwaysRefCounted for Mm {
> +unsafe impl RefCounted for Mm {
>      #[inline]
>      fn inc_ref(&self) {
>          // SAFETY: The pointer is valid since self is a reference.
> @@ -69,6 +69,9 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Mm>` from a `&Mm`.
> +unsafe impl AlwaysRefCounted for Mm {}
> +
>  /// A wrapper for the kernel's `struct mm_struct`.
>  ///
>  /// This type is like [`Mm`], but with non-zero `mm_users`. It can only be used when `mm_users` can
> @@ -91,7 +94,7 @@ unsafe impl Send for MmWithUser {}
>  unsafe impl Sync for MmWithUser {}
>  
>  // SAFETY: By the type invariants, this type is always refcounted.
> -unsafe impl AlwaysRefCounted for MmWithUser {
> +unsafe impl RefCounted for MmWithUser {
>      #[inline]
>      fn inc_ref(&self) {
>          // SAFETY: The pointer is valid since self is a reference.
> @@ -105,6 +108,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<MmWithUser>` from a
> +// `&MmWithUser`.
> +unsafe impl AlwaysRefCounted for MmWithUser {}
> +
>  // Make all `Mm` methods available on `MmWithUser`.
>  impl Deref for MmWithUser {
>      type Target = Mm;
> diff --git a/rust/kernel/mm/mmput_async.rs b/rust/kernel/mm/mmput_async.rs
> index b8d2f051225c..aba4ce675c86 100644
> --- a/rust/kernel/mm/mmput_async.rs
> +++ b/rust/kernel/mm/mmput_async.rs
> @@ -10,7 +10,8 @@
>  use crate::{
>      bindings,
>      mm::MmWithUser,
> -    sync::aref::{ARef, AlwaysRefCounted},
> +    sync::aref::RefCounted,
> +    types::{ARef, AlwaysRefCounted},
>  };
>  use core::{ops::Deref, ptr::NonNull};
>  
> @@ -34,7 +35,7 @@ unsafe impl Send for MmWithUserAsync {}
>  unsafe impl Sync for MmWithUserAsync {}
>  
>  // SAFETY: By the type invariants, this type is always refcounted.
> -unsafe impl AlwaysRefCounted for MmWithUserAsync {
> +unsafe impl RefCounted for MmWithUserAsync {
>      #[inline]
>      fn inc_ref(&self) {
>          // SAFETY: The pointer is valid since self is a reference.
> @@ -48,6 +49,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<MmWithUserAsync>`
> +// from a `&MmWithUserAsync`.
> +unsafe impl AlwaysRefCounted for MmWithUserAsync {}
> +
>  // Make all `MmWithUser` methods available on `MmWithUserAsync`.
>  impl Deref for MmWithUserAsync {
>      type Target = MmWithUser;
> diff --git a/rust/kernel/opp.rs b/rust/kernel/opp.rs
> index a760fac28765..06fe2ca776a4 100644
> --- a/rust/kernel/opp.rs
> +++ b/rust/kernel/opp.rs
> @@ -16,8 +16,8 @@
>      ffi::{c_char, c_ulong},
>      prelude::*,
>      str::CString,
> -    sync::aref::{ARef, AlwaysRefCounted},
> -    types::Opaque,
> +    sync::aref::RefCounted,
> +    types::{ARef, AlwaysRefCounted, Opaque},
>  };
>  
>  #[cfg(CONFIG_CPU_FREQ)]
> @@ -1041,7 +1041,7 @@ unsafe impl Send for OPP {}
>  unsafe impl Sync for OPP {}
>  
>  /// SAFETY: The type invariants guarantee that [`OPP`] is always refcounted.
> -unsafe impl AlwaysRefCounted for OPP {
> +unsafe impl RefCounted for OPP {
>      fn inc_ref(&self) {
>          // SAFETY: The existence of a shared reference means that the refcount is nonzero.
>          unsafe { bindings::dev_pm_opp_get(self.0.get()) };
> @@ -1053,6 +1053,10 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<OPP>` from an
> +// `&OPP`.
> +unsafe impl AlwaysRefCounted for OPP {}
> +
>  impl OPP {
>      /// Creates an owned reference to a [`OPP`] from a valid pointer.
>      ///
> diff --git a/rust/kernel/owned.rs b/rust/kernel/owned.rs
> index 5eacdf327d12..bedd4fef84fa 100644
> --- a/rust/kernel/owned.rs
> +++ b/rust/kernel/owned.rs
> @@ -27,7 +27,7 @@
>  ///
>  /// Note: The underlying object is not required to provide internal reference counting, because it
>  /// represents a unique, owned reference. If reference counting (on the Rust side) is required,
> -/// [`AlwaysRefCounted`](crate::types::AlwaysRefCounted) should be implemented.
> +/// [`RefCounted`](crate::types::RefCounted) should be implemented.
>  ///
>  /// # Examples
>  ///
> diff --git a/rust/kernel/pci.rs b/rust/kernel/pci.rs
> index af74ddff6114..acf7384fea02 100644
> --- a/rust/kernel/pci.rs
> +++ b/rust/kernel/pci.rs
> @@ -19,6 +19,10 @@
>      },
>      prelude::*,
>      str::CStr,
> +    sync::aref::{
> +        AlwaysRefCounted,
> +        RefCounted, //
> +    },
>      types::Opaque,
>      ThisModule, //
>  };
> @@ -474,7 +478,7 @@ unsafe impl<Ctx: device::DeviceContext> device::AsBusDevice<Ctx> for Device<Ctx>
>  impl crate::dma::Device for Device<device::Core> {}
>  
>  // SAFETY: Instances of `Device` are always reference-counted.
> -unsafe impl crate::sync::aref::AlwaysRefCounted for Device {
> +unsafe impl RefCounted for Device {
>      fn inc_ref(&self) {
>          // SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.
>          unsafe { bindings::pci_dev_get(self.as_raw()) };
> @@ -486,6 +490,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from a
> +// `&Device`.
> +unsafe impl AlwaysRefCounted for Device {}
> +
>  impl<Ctx: device::DeviceContext> AsRef<device::Device<Ctx>> for Device<Ctx> {
>      fn as_ref(&self) -> &device::Device<Ctx> {
>          // SAFETY: By the type invariant of `Self`, `self.as_raw()` is a pointer to a valid
> diff --git a/rust/kernel/pid_namespace.rs b/rust/kernel/pid_namespace.rs
> index 979a9718f153..4f6a94540e33 100644
> --- a/rust/kernel/pid_namespace.rs
> +++ b/rust/kernel/pid_namespace.rs
> @@ -7,7 +7,11 @@
>  //! C header: [`include/linux/pid_namespace.h`](srctree/include/linux/pid_namespace.h) and
>  //! [`include/linux/pid.h`](srctree/include/linux/pid.h)
>  
> -use crate::{bindings, sync::aref::AlwaysRefCounted, types::Opaque};
> +use crate::{
> +    bindings,
> +    sync::aref::RefCounted,
> +    types::{AlwaysRefCounted, Opaque},
> +};
>  use core::ptr;
>  
>  /// Wraps the kernel's `struct pid_namespace`. Thread safe.
> @@ -41,7 +45,7 @@ pub unsafe fn from_ptr<'a>(ptr: *const bindings::pid_namespace) -> &'a Self {
>  }
>  
>  // SAFETY: Instances of `PidNamespace` are always reference-counted.
> -unsafe impl AlwaysRefCounted for PidNamespace {
> +unsafe impl RefCounted for PidNamespace {
>      #[inline]
>      fn inc_ref(&self) {
>          // SAFETY: The existence of a shared reference means that the refcount is nonzero.
> @@ -55,6 +59,10 @@ unsafe fn dec_ref(obj: ptr::NonNull<PidNamespace>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<PidNamespace>` from
> +// a `&PidNamespace`.
> +unsafe impl AlwaysRefCounted for PidNamespace {}
> +
>  // SAFETY:
>  // - `PidNamespace::dec_ref` can be called from any thread.
>  // - It is okay to send ownership of `PidNamespace` across thread boundaries.
> diff --git a/rust/kernel/platform.rs b/rust/kernel/platform.rs
> index 8917d4ee499f..3c35aa94e319 100644
> --- a/rust/kernel/platform.rs
> +++ b/rust/kernel/platform.rs
> @@ -27,6 +27,7 @@
>      },
>      of,
>      prelude::*,
> +    sync::aref::{AlwaysRefCounted, RefCounted},
>      types::Opaque,
>      ThisModule, //
>  };
> @@ -512,7 +513,7 @@ pub fn optional_irq_by_name(&self, name: &CStr) -> Result<IrqRequest<'_>> {
>  impl crate::dma::Device for Device<device::Core> {}
>  
>  // SAFETY: Instances of `Device` are always reference-counted.
> -unsafe impl crate::sync::aref::AlwaysRefCounted for Device {
> +unsafe impl RefCounted for Device {
>      fn inc_ref(&self) {
>          // SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.
>          unsafe { bindings::get_device(self.as_ref().as_raw()) };
> @@ -524,6 +525,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from a
> +// `&Device`.
> +unsafe impl AlwaysRefCounted for Device {}
> +
>  impl<Ctx: device::DeviceContext> AsRef<device::Device<Ctx>> for Device<Ctx> {
>      fn as_ref(&self) -> &device::Device<Ctx> {
>          // SAFETY: By the type invariant of `Self`, `self.as_raw()` is a pointer to a valid
> diff --git a/rust/kernel/sync/aref.rs b/rust/kernel/sync/aref.rs
> index 4ee5fac0e0b6..2d656f672b97 100644
> --- a/rust/kernel/sync/aref.rs
> +++ b/rust/kernel/sync/aref.rs
> @@ -19,11 +19,9 @@
>  
>  use core::{marker::PhantomData, mem::ManuallyDrop, ops::Deref, ptr::NonNull};
>  
> -/// Types that are _always_ reference counted.
> +/// Types that are internally reference counted.
>  ///
>  /// It allows such types to define their own custom ref increment and decrement functions.
> -/// Additionally, it allows users to convert from a shared reference `&T` to an owned reference
> -/// [`ARef<T>`].
>  ///
>  /// This is usually implemented by wrappers to existing structures on the C side of the code. For
>  /// Rust code, the recommendation is to use [`Arc`](crate::sync::Arc) to create reference-counted
> @@ -40,9 +38,8 @@
>  /// at least until matching decrements are performed.
>  ///
>  /// Implementers must also ensure that all instances are reference-counted. (Otherwise they
> -/// won't be able to honour the requirement that [`AlwaysRefCounted::inc_ref`] keep the object
> -/// alive.)
> -pub unsafe trait AlwaysRefCounted {
> +/// won't be able to honour the requirement that [`RefCounted::inc_ref`] keep the object alive.)
> +pub unsafe trait RefCounted {
>      /// Increments the reference count on the object.
>      fn inc_ref(&self);
>  
> @@ -55,11 +52,27 @@ pub unsafe trait AlwaysRefCounted {
>      /// Callers must ensure that there was a previous matching increment to the reference count,
>      /// and that the object is no longer used after its reference count is decremented (as it may
>      /// result in the object being freed), unless the caller owns another increment on the refcount
> -    /// (e.g., it calls [`AlwaysRefCounted::inc_ref`] twice, then calls
> -    /// [`AlwaysRefCounted::dec_ref`] once).
> +    /// (e.g., it calls [`RefCounted::inc_ref`] twice, then calls [`RefCounted::dec_ref`] once).
>      unsafe fn dec_ref(obj: NonNull<Self>);
>  }
>  
> +/// Always reference-counted type.
> +///
> +/// It allows deriving a counted reference [`ARef<T>`] from a `&T`.
> +///
> +/// This provides some convenience, but it allows "escaping" borrow checks on `&T`. As it
> +/// complicates attempts to ensure that a reference to T is unique, it is optional to provide for
> +/// [`RefCounted`] types. See *Safety* below.
> +///
> +/// # Safety
> +///
> +/// Implementers must ensure that no safety invariants are violated by upgrading an `&T` to an
> +/// [`ARef<T>`]. In particular that implies [`AlwaysRefCounted`] and [`crate::types::Ownable`]
> +/// cannot be implemented for the same type, as this would allow violating the uniqueness guarantee
> +/// of [`crate::types::Owned<T>`] by dereferencing it into an `&T` and obtaining an [`ARef`] from
> +/// that.
> +pub unsafe trait AlwaysRefCounted: RefCounted {}
> +
>  /// An owned reference to an always-reference-counted object.
>  ///
>  /// The object's reference count is automatically decremented when an instance of [`ARef`] is
> @@ -70,7 +83,7 @@ pub unsafe trait AlwaysRefCounted {
>  ///
>  /// The pointer stored in `ptr` is non-null and valid for the lifetime of the [`ARef`] instance. In
>  /// particular, the [`ARef`] instance owns an increment on the underlying object's reference count.
> -pub struct ARef<T: AlwaysRefCounted> {
> +pub struct ARef<T: RefCounted> {
>      ptr: NonNull<T>,
>      _p: PhantomData<T>,
>  }
> @@ -79,19 +92,19 @@ pub struct ARef<T: AlwaysRefCounted> {
>  // it effectively means sharing `&T` (which is safe because `T` is `Sync`); additionally, it needs
>  // `T` to be `Send` because any thread that has an `ARef<T>` may ultimately access `T` using a
>  // mutable reference, for example, when the reference count reaches zero and `T` is dropped.
> -unsafe impl<T: AlwaysRefCounted + Sync + Send> Send for ARef<T> {}
> +unsafe impl<T: RefCounted + Sync + Send> Send for ARef<T> {}
>  
>  // SAFETY: It is safe to send `&ARef<T>` to another thread when the underlying `T` is `Sync`
>  // because it effectively means sharing `&T` (which is safe because `T` is `Sync`); additionally,
>  // it needs `T` to be `Send` because any thread that has a `&ARef<T>` may clone it and get an
>  // `ARef<T>` on that thread, so the thread may ultimately access `T` using a mutable reference, for
>  // example, when the reference count reaches zero and `T` is dropped.
> -unsafe impl<T: AlwaysRefCounted + Sync + Send> Sync for ARef<T> {}
> +unsafe impl<T: RefCounted + Sync + Send> Sync for ARef<T> {}
>  
>  // Even if `T` is pinned, pointers to `T` can still move.
> -impl<T: AlwaysRefCounted> Unpin for ARef<T> {}
> +impl<T: RefCounted> Unpin for ARef<T> {}
>  
> -impl<T: AlwaysRefCounted> ARef<T> {
> +impl<T: RefCounted> ARef<T> {
>      /// Creates a new instance of [`ARef`].
>      ///
>      /// It takes over an increment of the reference count on the underlying object.
> @@ -120,12 +133,12 @@ pub unsafe fn from_raw(ptr: NonNull<T>) -> Self {
>      ///
>      /// ```
>      /// use core::ptr::NonNull;
> -    /// use kernel::sync::aref::{ARef, AlwaysRefCounted};
> +    /// use kernel::sync::aref::{ARef, RefCounted};
>      ///
>      /// struct Empty {}
>      ///
>      /// # // SAFETY: TODO.
> -    /// unsafe impl AlwaysRefCounted for Empty {
> +    /// unsafe impl RefCounted for Empty {
>      ///     fn inc_ref(&self) {}
>      ///     unsafe fn dec_ref(_obj: NonNull<Self>) {}
>      /// }
> @@ -143,7 +156,7 @@ pub fn into_raw(me: Self) -> NonNull<T> {
>      }
>  }
>  
> -impl<T: AlwaysRefCounted> Clone for ARef<T> {
> +impl<T: RefCounted> Clone for ARef<T> {
>      fn clone(&self) -> Self {
>          self.inc_ref();
>          // SAFETY: We just incremented the refcount above.
> @@ -151,7 +164,7 @@ fn clone(&self) -> Self {
>      }
>  }
>  
> -impl<T: AlwaysRefCounted> Deref for ARef<T> {
> +impl<T: RefCounted> Deref for ARef<T> {
>      type Target = T;
>  
>      fn deref(&self) -> &Self::Target {
> @@ -168,7 +181,7 @@ fn from(b: &T) -> Self {
>      }
>  }
>  
> -impl<T: AlwaysRefCounted> Drop for ARef<T> {
> +impl<T: RefCounted> Drop for ARef<T> {
>      fn drop(&mut self) {
>          // SAFETY: The type invariants guarantee that the `ARef` owns the reference we're about to
>          // decrement.
> diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> index 38273f4eedb5..6259430b0ca3 100644
> --- a/rust/kernel/task.rs
> +++ b/rust/kernel/task.rs
> @@ -10,7 +10,12 @@
>      pid_namespace::PidNamespace,
>      prelude::*,
>      sync::aref::ARef,
> -    types::{NotThreadSafe, Opaque},
> +    types::{
> +        AlwaysRefCounted,
> +        NotThreadSafe,
> +        Opaque,
> +        RefCounted, //
> +    },
>  };
>  use core::{
>      ops::Deref,
> @@ -347,7 +352,7 @@ pub fn group_leader(&self) -> &Task {
>  }
>  
>  // SAFETY: The type invariants guarantee that `Task` is always refcounted.
> -unsafe impl crate::sync::aref::AlwaysRefCounted for Task {
> +unsafe impl RefCounted for Task {
>      #[inline]
>      fn inc_ref(&self) {
>          // SAFETY: The existence of a shared reference means that the refcount is nonzero.
> @@ -361,6 +366,10 @@ unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Task>` from a
> +// `&Task`.
> +unsafe impl AlwaysRefCounted for Task {}
> +
>  impl PartialEq for Task {
>      #[inline]
>      fn eq(&self, other: &Self) -> bool {
> diff --git a/rust/kernel/types.rs b/rust/kernel/types.rs
> index 4aec7b699269..9b96aa2ebdb7 100644
> --- a/rust/kernel/types.rs
> +++ b/rust/kernel/types.rs
> @@ -18,7 +18,8 @@
>      },
>      sync::aref::{
>          ARef,
> -        AlwaysRefCounted, //
> +        AlwaysRefCounted,
> +        RefCounted, //
>      }, //
>  };
>  
> diff --git a/rust/kernel/usb.rs b/rust/kernel/usb.rs
> index 9c17a672cd27..90b13e65cc82 100644
> --- a/rust/kernel/usb.rs
> +++ b/rust/kernel/usb.rs
> @@ -18,7 +18,10 @@
>          to_result, //
>      },
>      prelude::*,
> -    sync::aref::AlwaysRefCounted,
> +    sync::aref::{
> +        AlwaysRefCounted,
> +        RefCounted, //
> +    },
>      types::Opaque,
>      ThisModule, //
>  };
> @@ -381,7 +384,7 @@ fn as_ref(&self) -> &Device {
>  }
>  
>  // SAFETY: Instances of `Interface` are always reference-counted.
> -unsafe impl AlwaysRefCounted for Interface {
> +unsafe impl RefCounted for Interface {
>      fn inc_ref(&self) {
>          // SAFETY: The invariants of `Interface` guarantee that `self.as_raw()`
>          // returns a valid `struct usb_interface` pointer, for which we will
> @@ -395,6 +398,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Interface>` from a
> +// `&Interface`.
> +unsafe impl AlwaysRefCounted for Interface {}
> +
>  // SAFETY: A `Interface` is always reference-counted and can be released from any thread.
>  unsafe impl Send for Interface {}
>  
> @@ -432,7 +439,7 @@ fn as_raw(&self) -> *mut bindings::usb_device {
>  kernel::impl_device_context_into_aref!(Device);
>  
>  // SAFETY: Instances of `Device` are always reference-counted.
> -unsafe impl AlwaysRefCounted for Device {
> +unsafe impl RefCounted for Device {
>      fn inc_ref(&self) {
>          // SAFETY: The invariants of `Device` guarantee that `self.as_raw()`
>          // returns a valid `struct usb_device` pointer, for which we will
> @@ -446,6 +453,10 @@ unsafe fn dec_ref(obj: NonNull<Self>) {
>      }
>  }
>  
> +// SAFETY: We do not implement `Ownable`, thus it is okay to obtain an `ARef<Device>` from a
> +// `&Device`.
> +unsafe impl AlwaysRefCounted for Device {}
> +
>  impl<Ctx: device::DeviceContext> AsRef<device::Device<Ctx>> for Device<Ctx> {
>      fn as_ref(&self) -> &device::Device<Ctx> {
>          // SAFETY: By the type invariant of `Self`, `self.as_raw()` is a pointer to a valid
> 
> -- 
> 2.51.2
> 
> 

^ permalink raw reply

* Re: [PATCH v17 08/10] rust: aref: update formatting of use statements
From: Onur Özkan @ 2026-06-23 17:55 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Greg Kroah-Hartman,
	Dave Ertman, Ira Weiny, Leon Romanovsky, Paul Moore, Serge Hallyn,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Alexander Viro,
	Christian Brauner, Jan Kara, Daniel Almeida, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Bjorn Helgaas,
	Krzysztof Wilczyński, Boqun Feng, Uladzislau Rezki,
	Lorenzo Stoakes, Vlastimil Babka, Liam R. Howlett, Igor Korotin,
	Pavel Tikhomirov, linux-kernel, rust-for-linux, linux-block,
	linux-security-module, dri-devel, linux-fsdevel, linux-mm,
	linux-pm, linux-pci, driver-core
In-Reply-To: <20260604-unique-ref-v17-8-7b4c3d2930b9@kernel.org>

On Thu, 04 Jun 2026 22:11:20 +0200
Andreas Hindborg <a.hindborg@kernel.org> wrote:

> Update formatting if use statements in preparation for next commit.

I guess you meant "formatting use statements"? Also, why not doing this in
the next commit directly?

Onur

> 
> Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
> ---
>  rust/kernel/sync/aref.rs | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/rust/kernel/sync/aref.rs b/rust/kernel/sync/aref.rs
> index 7491382bcf29..818c84fa923a 100644
> --- a/rust/kernel/sync/aref.rs
> +++ b/rust/kernel/sync/aref.rs
> @@ -17,7 +17,12 @@
>  //! [`Arc`]: crate::sync::Arc
>  //! [`Arc<T>`]: crate::sync::Arc
>  
> -use core::{marker::PhantomData, mem::ManuallyDrop, ops::Deref, ptr::NonNull};
> +use core::{
> +    marker::PhantomData,
> +    mem::ManuallyDrop,
> +    ops::Deref,
> +    ptr::NonNull, //
> +};
>  
>  /// Types that are internally reference counted.
>  ///
> 
> -- 
> 2.51.2
> 
> 

^ permalink raw reply

* Re: [PATCH v17 10/10] rust: page: add `from_raw()`
From: Onur Özkan @ 2026-06-23 17:52 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Greg Kroah-Hartman,
	Dave Ertman, Ira Weiny, Leon Romanovsky, Paul Moore, Serge Hallyn,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Alexander Viro,
	Christian Brauner, Jan Kara, Daniel Almeida, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Bjorn Helgaas,
	Krzysztof Wilczyński, Boqun Feng, Uladzislau Rezki,
	Lorenzo Stoakes, Vlastimil Babka, Liam R. Howlett, Igor Korotin,
	Pavel Tikhomirov, linux-kernel, rust-for-linux, linux-block,
	linux-security-module, dri-devel, linux-fsdevel, linux-mm,
	linux-pm, linux-pci, driver-core, Onur Özkan
In-Reply-To: <20260604-unique-ref-v17-10-7b4c3d2930b9@kernel.org>

On Thu, 04 Jun 2026 22:11:22 +0200
Andreas Hindborg <a.hindborg@kernel.org> wrote:

> From: Andreas Hindborg <a.hindborg@samsung.com>
> 
> Add a method to `Page` that allows construction of an instance from `struct
> page` pointer.
> 
> Signed-off-by: Andreas Hindborg <a.hindborg@samsung.com>
> ---
>  rust/kernel/page.rs | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs
> index 844c75e54134..d56ae597f692 100644
> --- a/rust/kernel/page.rs
> +++ b/rust/kernel/page.rs
> @@ -214,6 +214,18 @@ pub fn nid(&self) -> i32 {
>          unsafe { bindings::page_to_nid(self.as_ptr()) }
>      }
>  
> +    /// Create a `&Page` from a raw `struct page` pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// `ptr` must be convertible to a shared reference with a lifetime of `'a`.
> +    #[inline]
> +    pub unsafe fn from_raw<'a>(ptr: *const bindings::page) -> &'a Self {
> +        // SAFETY: By function safety requirements, `ptr` is not null and is convertible to a shared
> +        // reference.
> +        unsafe { &*ptr.cast() }
> +    }
> +
>      /// Runs a piece of code with this page mapped to an address.
>      ///
>      /// The page is unmapped when this call returns.
> 
> -- 
> 2.51.2
> 
> 

Reviewed-by: Onur Özkan <work@onurozkan.dev>

^ permalink raw reply

* Re: [PATCHv2 6/6] block: validate user space vectors during extraction
From: Keith Busch @ 2026-06-23 16:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-block, linux-fsdevel, dm-devel, axboe, brauner,
	djwong, viro, stable
In-Reply-To: <20260623151021.GA14919@lst.de>

On Tue, Jun 23, 2026 at 05:10:21PM +0200, Christoph Hellwig wrote:
> > +	/*
> > +	 * The vectors are owned and laid out by the caller; we only forward
> > +	 * them. Most callers are already aligned, but io_uring can place a
> > +	 * user chosen offset through a registered buffer, where only the first
> > +	 * vector may be unaligned.
> > +	 */
> > +	return !(mp_bvec_iter_offset(bio->bi_io_vec, bio->bi_iter) &
> > +							mem_align_mask);
> 
> I don't fully understand the comment.  I guess this is to say ITER_BVEC
> users better don't create any alignment gaps?  Maybe we should also
> clearly document that in uio.h?

Exactly, the in-kernel users of ITER_BVEC that allocate their own
buffers are, as far as I know, aligned already. Fabric storage targets
like nvme allocate their own SGLs on page boundaries so the bio is
aligned at the point it was constructed.

The ones that forward user buffers like loop and zloop are addressed in
the previous two patches. They generally should have been fine for most
hardware without those updates, but they're included in case a backing
device has more restrictive constraints than 512b "sector_t" aligned.

The only other user space provided alignment that I think may trip this
up is the io_uring registered buffer, so that's what I'm trying to call
out here.

^ permalink raw reply

* Re: [PATCHv2 6/6] block: validate user space vectors during extraction
From: Christoph Hellwig @ 2026-06-23 15:10 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-block, linux-fsdevel, dm-devel, hch, axboe, brauner, djwong,
	viro, Keith Busch, stable
In-Reply-To: <20260622174241.2299563-7-kbusch@meta.com>

> +#ifdef CONFIG_DEBUG_KERNEL

That's a pretty broad option.  Not that I have any better idea off the
bat.

> +static inline bool bio_iov_bvec_aligned(const struct bio *bio,
> +					unsigned mem_align_mask)
> +{
> +	/*
> +	 * The vectors are owned and laid out by the caller; we only forward
> +	 * them. Most callers are already aligned, but io_uring can place a
> +	 * user chosen offset through a registered buffer, where only the first
> +	 * vector may be unaligned.
> +	 */
> +	return !(mp_bvec_iter_offset(bio->bi_io_vec, bio->bi_iter) &
> +							mem_align_mask);

I don't fully understand the comment.  I guess this is to say ITER_BVEC
users better don't create any alignment gaps?  Maybe we should also
clearly document that in uio.h?

>  	return bio_iov_iter_get_pages(bio, iter,
> +			bdev_dma_alignment(bdev),

Nit: this easily fits onto the previous line.

Otherwise this looks good.

^ permalink raw reply

* Re: [PATCHv2 1/6] block: introduce bio_endio_errno helper
From: Christoph Hellwig @ 2026-06-23 15:07 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Keith Busch, linux-block, linux-fsdevel,
	dm-devel, axboe, brauner, djwong, viro
In-Reply-To: <ajqgzcEQm6BthFvx@kbusch-mbp>

On Tue, Jun 23, 2026 at 09:05:49AM -0600, Keith Busch wrote:
> On Tue, Jun 23, 2026 at 04:54:31PM +0200, Christoph Hellwig wrote:
> > On Mon, Jun 22, 2026 at 10:42:36AM -0700, Keith Busch wrote:
> > > From: Keith Busch <kbusch@kernel.org>
> > > 
> > > No functional change; purely introducing a convenience function.
> > 
> > I've been deeply into untangling the 1:1 BLK_STS_ mapping to errnos,
> > as propagating them up that way often causes more issues then it
> > solves.  So we can avoid it, I'd rather not add more helpers to
> > facility that (even if the helpers are just the messenger and not
> > the cause of the problem).
> 
> Sure, that's fine. I'm not sure what you have in mind for untangling the
> errno:blk_status_t mappings, but I can certainly have the new users this
> series introduces open code it like the existing users if that's
> alright.

I've tried a few things and banged my ahead against the wall, so I'm
not entirely sure yet either..

^ permalink raw reply

* Re: [PATCHv2 5/6] zloop: set dma_alignment from the backing files for direct I/O
From: Christoph Hellwig @ 2026-06-23 15:06 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-block, linux-fsdevel, dm-devel, hch, axboe, brauner, djwong,
	viro, Keith Busch
In-Reply-To: <20260622174241.2299563-6-kbusch@meta.com>

On Mon, Jun 22, 2026 at 10:42:40AM -0700, Keith Busch wrote:
>  {
>  	struct block_device *sb_bdev = zone->file->f_mapping->host->i_sb->s_bdev;
>  	struct kstat st;
> +	bool have_dioalign = !vfs_getattr(&zone->file->f_path, &st,
> +					  STATX_DIOALIGN, 0) &&
> +			     (st.result_mask & STATX_DIOALIGN);

This is getting a bit crazy for an assignment :)

Maybe refactor this along the lines of the loop.c code?

> +	/* Direct I/O hands the request's pages to the backing files unchanged. */

Overly long line.


^ permalink raw reply

* Re: [PATCHv2 1/6] block: introduce bio_endio_errno helper
From: Keith Busch @ 2026-06-23 15:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-block, linux-fsdevel, dm-devel, axboe, brauner,
	djwong, viro
In-Reply-To: <20260623145431.GA13628@lst.de>

On Tue, Jun 23, 2026 at 04:54:31PM +0200, Christoph Hellwig wrote:
> On Mon, Jun 22, 2026 at 10:42:36AM -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > No functional change; purely introducing a convenience function.
> 
> I've been deeply into untangling the 1:1 BLK_STS_ mapping to errnos,
> as propagating them up that way often causes more issues then it
> solves.  So we can avoid it, I'd rather not add more helpers to
> facility that (even if the helpers are just the messenger and not
> the cause of the problem).

Sure, that's fine. I'm not sure what you have in mind for untangling the
errno:blk_status_t mappings, but I can certainly have the new users this
series introduces open code it like the existing users if that's
alright.

^ permalink raw reply

* Re: [PATCHv2 4/6] loop: set dma_alignment from the backing file for direct I/O
From: Christoph Hellwig @ 2026-06-23 15:04 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-block, linux-fsdevel, dm-devel, hch, axboe, brauner, djwong,
	viro, Keith Busch
In-Reply-To: <20260622174241.2299563-5-kbusch@meta.com>

On Mon, Jun 22, 2026 at 10:42:39AM -0700, Keith Busch wrote:
>  	/*
> -	 * Use the minimal dio alignment of the file system if provided.
> +	 * Use the dio alignment of the file system if provided.  dio_offset_align
> +	 * is the minimum dio size and offset; dio_mem_align is the buffer memory
> +	 * alignment, kept as a mask to become the loop device's dma_alignment in
> +	 * direct I/O mode where the buffer is handed to the backing file unchanged.

A bunch of overly long lines here.

> +	 * In direct I/O the user pages are handed to the backing file as-is, so
> +	 * the backing's DMA alignment requirement applies to them.  Advertise it
> +	 * so misaligned I/O is rejected at this device's entry instead of being
> +	 * dispatched to the backend.  Buffered I/O copies through the page cache
> +	 * and imposes no such requirement.
> +	 */

More line spillover here.

> +	if (lo->lo_flags & LO_FLAGS_DIRECT_IO)
> +		lim->dma_alignment = lo->lo_dio_mem_align;
> +	else
> +		lim->dma_alignment = SECTOR_SIZE - 1;

Despite the comment above this does enforce a SECTOR_SIZE dma
alignment for buffered I/O.  Shouldn't this be our lowest supported
value (or dword alignment to match real devices)?

> +	lim = queue_limits_start_update(lo->lo_queue);
> +	if (lo->lo_flags & LO_FLAGS_DIRECT_IO)
> +		lim.dma_alignment = lo->lo_dio_mem_align;
> +	else
> +		lim.dma_alignment = SECTOR_SIZE - 1;

Should this and the above copy of this assignment be factored into a
helper?


^ permalink raw reply

* Re: [PATCHv2 3/6] block: fix dio leak on metadata mapping error
From: Christoph Hellwig @ 2026-06-23 15:01 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-block, linux-fsdevel, dm-devel, hch, axboe, brauner, djwong,
	viro, Keith Busch
In-Reply-To: <20260622174241.2299563-4-kbusch@meta.com>

On Mon, Jun 22, 2026 at 10:42:38AM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> A failed integrity mapping holds a dio reference, so we need to go
> through the full bio ending in case there were previously submitted
> bio's in the sequence.

Yeah, the goto fail is for sure wrong here.  I have a vague memory
of seeing the same or at least a very similar patch from others before,
but right now I'm too overloaded to find out if that really was the
case.

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply

* Re: [PATCHv2 2/6] block: report the actual status
From: Keith Busch @ 2026-06-23 14:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-block, linux-fsdevel, dm-devel, axboe, brauner,
	djwong, viro
In-Reply-To: <20260623145511.GB13628@lst.de>

On Tue, Jun 23, 2026 at 04:55:11PM +0200, Christoph Hellwig wrote:
> On Mon, Jun 22, 2026 at 10:42:37AM -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > Rather than assume EIO, set the actual reported status for user space
> > informational purposes.
> 
> Where "informational purposes" primarily mean not dropping the EINVAL
> for incorrect alignment, right?  

It could be any possible error, but for the practical purposes of this
series, yes, EINVAL is the status I need forwarded. But EFAULT was also
always a real possibility that wouldn't have been reported.

^ permalink raw reply

* Re: [PATCHv2 2/6] block: report the actual status
From: Christoph Hellwig @ 2026-06-23 14:55 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-block, linux-fsdevel, dm-devel, hch, axboe, brauner, djwong,
	viro, Keith Busch
In-Reply-To: <20260622174241.2299563-3-kbusch@meta.com>

On Mon, Jun 22, 2026 at 10:42:37AM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Rather than assume EIO, set the actual reported status for user space
> informational purposes.

Where "informational purposes" primarily mean not dropping the EINVAL
for incorrect alignment, right?  Maybe state that more clearly..


^ permalink raw reply

* Re: [PATCHv2 1/6] block: introduce bio_endio_errno helper
From: Christoph Hellwig @ 2026-06-23 14:54 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-block, linux-fsdevel, dm-devel, hch, axboe, brauner, djwong,
	viro, Keith Busch
In-Reply-To: <20260622174241.2299563-2-kbusch@meta.com>

On Mon, Jun 22, 2026 at 10:42:36AM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> No functional change; purely introducing a convenience function.

I've been deeply into untangling the 1:1 BLK_STS_ mapping to errnos,
as propagating them up that way often causes more issues then it
solves.  So we can avoid it, I'd rather not add more helpers to
facility that (even if the helpers are just the messenger and not
the cause of the problem).


^ permalink raw reply

* [PATCH 2/2] block: handle REQ_OP_ZONE_APPEND in __bio_integrity_action
From: Christoph Hellwig @ 2026-06-23 14:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Caleb Sander Mateos, Martin K. Petersen, linux-block
In-Reply-To: <20260623142957.1839474-1-hch@lst.de>

Otherwise zone append commands will miss their integrity data.  While
this works "fine" for auto-PI, it break file system PI and non-PI
metadata.

With this fix 512-byte block PI and non-PI metadata work fine on native
and emulated Zone Append.  4k block size PI works fine with the
"block: fix integrity offset/length conversions" series from
Caleb Sander Mateos.

Fixes: df3c485e0e60 ("block: switch on bio operation in bio_integrity_prep")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/bio-integrity.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 488eba228c6e..9fb722ace0a8 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -38,6 +38,7 @@ unsigned int __bio_integrity_action(struct bio *bio)
 		}
 		return BI_ACT_BUFFER | BI_ACT_CHECK;
 	case REQ_OP_WRITE:
+	case REQ_OP_ZONE_APPEND:
 		/*
 		 * Flush masquerading as write?
 		 */
-- 
2.53.0


^ permalink raw reply related

* [PATCH 1/2] block: fix GFP_ flags confusion in bio_integrity_alloc_buf
From: Christoph Hellwig @ 2026-06-23 14:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Caleb Sander Mateos, Martin K. Petersen, linux-block
In-Reply-To: <20260623142957.1839474-1-hch@lst.de>

bio_integrity_alloc_buf usage of GFP_ flags is messed up.  For one it
mixes GFP_NOFS and GFP_NOIO for neighbouring allocations, but it also
makes the allocations fail more often than needed.  That code was copied
from bio_alloc_bioset which needs to do that so that it can punt to the
rescuer workqueue, but none of that is needed for the integrity
allocations that either sits in the file system or at the very bottom
of the I/O stack.  Failing early means we'll do a fully waiting
allocation from the mempool ->alloc callback which is usually much
larger than required.

Fix this by passing a gfp_t so that the file system path can pass
GFP_NOFS and the auto-integrity code can pass GFP_NOIO, and don't
modify the allocation type except for disabling warnings.

Fixes: ec7f31b2a2d3 ("block: make bio auto-integrity deadlock safe")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/bio-integrity-auto.c    | 2 +-
 block/bio-integrity-fs.c      | 4 ++--
 block/bio-integrity.c         | 8 +++-----
 include/linux/bio-integrity.h | 2 +-
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/block/bio-integrity-auto.c b/block/bio-integrity-auto.c
index 353eed632fcc..b1c733ecfd2e 100644
--- a/block/bio-integrity-auto.c
+++ b/block/bio-integrity-auto.c
@@ -94,7 +94,7 @@ void bio_integrity_prep(struct bio *bio, unsigned int action)
 	bio_integrity_init(bio, &bid->bip, &bid->bvec, 1);
 	bid->bio = bio;
 	bid->bip.bip_flags |= BIP_BLOCK_INTEGRITY;
-	bio_integrity_alloc_buf(bio, action & BI_ACT_ZERO);
+	bio_integrity_alloc_buf(bio, GFP_NOIO, action & BI_ACT_ZERO);
 	if (action & BI_ACT_CHECK)
 		bio_integrity_setup_default(bio);
 
diff --git a/block/bio-integrity-fs.c b/block/bio-integrity-fs.c
index 0daa42d9ead7..9c5fe5fa8f0d 100644
--- a/block/bio-integrity-fs.c
+++ b/block/bio-integrity-fs.c
@@ -23,10 +23,10 @@ unsigned int fs_bio_integrity_alloc(struct bio *bio)
 	if (!action)
 		return 0;
 
-	iib = mempool_alloc(&fs_bio_integrity_pool, GFP_NOIO);
+	iib = mempool_alloc(&fs_bio_integrity_pool, GFP_NOFS);
 	bio_integrity_init(bio, &iib->bip, &iib->bvec, 1);
 
-	bio_integrity_alloc_buf(bio, action & BI_ACT_ZERO);
+	bio_integrity_alloc_buf(bio, GFP_NOFS, action & BI_ACT_ZERO);
 	if (action & BI_ACT_CHECK)
 		bio_integrity_setup_default(bio);
 	return action;
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index e796de1a749e..488eba228c6e 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -64,20 +64,18 @@ unsigned int __bio_integrity_action(struct bio *bio)
 }
 EXPORT_SYMBOL_GPL(__bio_integrity_action);
 
-void bio_integrity_alloc_buf(struct bio *bio, bool zero_buffer)
+void bio_integrity_alloc_buf(struct bio *bio, gfp_t gfp, bool zero_buffer)
 {
 	struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
 	struct bio_integrity_payload *bip = bio_integrity(bio);
 	unsigned int len = bio_integrity_bytes(bi, bio_sectors(bio));
-	gfp_t gfp = GFP_NOIO | (zero_buffer ? __GFP_ZERO : 0);
 	void *buf;
 
-	buf = kmalloc(len, (gfp & ~__GFP_DIRECT_RECLAIM) |
-			__GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN);
+	buf = kmalloc(len, gfp | __GFP_NOWARN | zero_buffer ? __GFP_ZERO : 0);
 	if (unlikely(!buf)) {
 		struct page *page;
 
-		page = mempool_alloc(&integrity_buf_pool, GFP_NOFS);
+		page = mempool_alloc(&integrity_buf_pool, gfp);
 		if (zero_buffer)
 			memset(page_address(page), 0, len);
 		bvec_set_page(&bip->bip_vec[0], page, len, 0);
diff --git a/include/linux/bio-integrity.h b/include/linux/bio-integrity.h
index af5178434ec6..c3dda32fd803 100644
--- a/include/linux/bio-integrity.h
+++ b/include/linux/bio-integrity.h
@@ -141,7 +141,7 @@ static inline int bio_integrity_add_page(struct bio *bio, struct page *page,
 }
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
-void bio_integrity_alloc_buf(struct bio *bio, bool zero_buffer);
+void bio_integrity_alloc_buf(struct bio *bio, gfp_t gfp, bool zero_buffer);
 void bio_integrity_free_buf(struct bio_integrity_payload *bip);
 void bio_integrity_setup_default(struct bio *bio);
 
-- 
2.53.0


^ permalink raw reply related

* PI fixes
From: Christoph Hellwig @ 2026-06-23 14:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Caleb Sander Mateos, Martin K. Petersen, linux-block

Hi all,

this series has two unrelated PI/metadata fixes that came up
during a little testing surge.

Diffstat:
 block/bio-integrity-auto.c    |    2 +-
 block/bio-integrity-fs.c      |    4 ++--
 block/bio-integrity.c         |    9 ++++-----
 include/linux/bio-integrity.h |    2 +-
 4 files changed, 8 insertions(+), 9 deletions(-)

^ permalink raw reply

* Re: [PATCH v4 9/9] rust: macros: remove `THIS_MODULE` static from `module!`
From: Gary Guo @ 2026-06-23 13:53 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci
In-Reply-To: <20260623-fix-fops-owner-v4-9-0daf5f077d5c@linux.dev>

On Tue Jun 23, 2026 at 7:29 AM BST, Alvin Sun wrote:
> All users have been migrated to `ModuleMetadata::THIS_MODULE` const or
> `this_module::<LocalModule>()` helper. The `static THIS_MODULE`
> generated by the `module!` macro is no longer referenced anywhere,
> so remove it to avoid having two sources of the same `ThisModule`
> pointer.
> 
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  rust/macros/module.rs | 16 ----------------
>  1 file changed, 16 deletions(-)


^ permalink raw reply

* Re: [PATCH v4 8/9] rust: binder: use `LocalModule` for `THIS_MODULE`
From: Gary Guo @ 2026-06-23 13:53 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci
In-Reply-To: <20260623-fix-fops-owner-v4-8-0daf5f077d5c@linux.dev>

On Tue Jun 23, 2026 at 7:29 AM BST, Alvin Sun wrote:
> Replace the `THIS_MODULE` static reference in the binder fops with
> `this_module::<LocalModule>()`, consistent with the move of
> `THIS_MODULE` into the `ModuleMetadata` trait.
> 
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  drivers/android/binder/rust_binder_main.rs | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)


^ permalink raw reply

* Re: [PATCH v4 7/9] rust: configfs: use `LocalModule` for `THIS_MODULE`
From: Gary Guo @ 2026-06-23 13:53 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci
In-Reply-To: <20260623-fix-fops-owner-v4-7-0daf5f077d5c@linux.dev>

On Tue Jun 23, 2026 at 7:29 AM BST, Alvin Sun wrote:
> Replace the `THIS_MODULE` static reference in the `configfs_attrs!`
> macro with `this_module::<LocalModule>()`, and update
> rnull to import `LocalModule` instead of `THIS_MODULE`, consistent
> with the move of `THIS_MODULE` into the `ModuleMetadata` trait.
>
> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
> ---
>  drivers/block/rnull/configfs.rs | 6 ++----
>  rust/kernel/configfs.rs         | 8 +++++---
>  2 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
> index c10a55fc58948..b2547ad1e5ddd 100644
> --- a/drivers/block/rnull/configfs.rs
> +++ b/drivers/block/rnull/configfs.rs
> @@ -1,9 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  
> -use super::{
> -    NullBlkDevice,
> -    THIS_MODULE, //
> -};
> +use super::NullBlkDevice;
> +use crate::LocalModule;
>  use kernel::{
>      block::mq::gen_disk::{
>          GenDisk,
> diff --git a/rust/kernel/configfs.rs b/rust/kernel/configfs.rs
> index 2339c6467325d..b542422115461 100644
> --- a/rust/kernel/configfs.rs
> +++ b/rust/kernel/configfs.rs
> @@ -875,7 +875,7 @@ fn as_ptr(&self) -> *const bindings::config_item_type {
>  ///                 configfs::Subsystem<Configuration>,
>  ///                 Configuration
>  ///                 >::new_with_child_ctor::<N,Child>(
> -///             &THIS_MODULE,
> +///             ::kernel::module::this_module::<LocalModule>(),

This should be `crate::LocalModule`.

Best,
Gary

>  ///             &CONFIGURATION_ATTRS
>  ///         );
>  ///
> @@ -1021,7 +1021,8 @@ macro_rules! configfs_attrs {
>  
>                      static [< $data:upper _TPE >] : $crate::configfs::ItemType<$container, $data>  =
>                          $crate::configfs::ItemType::<$container, $data>::new::<N>(
> -                            &THIS_MODULE, &[<$ data:upper _ATTRS >]
> +                            $crate::module::this_module::<LocalModule>(),
> +                            &[<$ data:upper _ATTRS >]
>                          );
>                  )?
>  
> @@ -1030,7 +1031,8 @@ macro_rules! configfs_attrs {
>                          $crate::configfs::ItemType<$container, $data>  =
>                              $crate::configfs::ItemType::<$container, $data>::
>                              new_with_child_ctor::<N, $child>(
> -                                &THIS_MODULE, &[<$ data:upper _ATTRS >]
> +                                $crate::module::this_module::<LocalModule>(),
> +                                &[<$ data:upper _ATTRS >]
>                              );
>                  )?
>  



^ permalink raw reply

* Re: [PATCH v4 6/9] rust: miscdevice: set fops.owner from driver module pointer
From: Gary Guo @ 2026-06-23 13:51 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci
In-Reply-To: <20260623-fix-fops-owner-v4-6-0daf5f077d5c@linux.dev>

On Tue Jun 23, 2026 at 7:29 AM BST, Alvin Sun wrote:
> Set the miscdevice fops owner field from the driver module pointer
> via the `this_module::<T::OwnerModule>()` helper, instead of
> defaulting to null.
> 
> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  rust/kernel/miscdevice.rs | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)


^ permalink raw reply

* Re: [PATCH v4 4/9] rust: macros: auto-insert OwnerModule in #[vtable]
From: Gary Guo @ 2026-06-23 13:50 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci
In-Reply-To: <20260623-fix-fops-owner-v4-4-0daf5f077d5c@linux.dev>

On Tue Jun 23, 2026 at 7:29 AM BST, Alvin Sun wrote:
> Auto-add `type OwnerModule: ::kernel::ModuleMetadata;` as a required
> associated type on the trait side if not already defined, and
> auto-insert `type OwnerModule = crate::LocalModule;` on the impl side
> if not explicitly provided, eliminating the need to manually declare
> and implement `OwnerModule` in every vtable trait and impl.
> 
> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
> Suggested-by: Gary Guo <gary@garyguo.net>
> Link: https://lore.kernel.org/all/DIMMWHUOLPSH.13JFRHDKDQJGO@garyguo.net
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  rust/macros/lib.rs    |  6 ++++++
>  rust/macros/vtable.rs | 41 ++++++++++++++++++++++++++++++++++++-----
>  2 files changed, 42 insertions(+), 5 deletions(-)


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox