Linux block layer
 help / color / mirror / Atom feed
* [PATCH v2 00/83] block: rnull: complete the rust null block driver
From: Andreas Hindborg @ 2026-06-09 19:07 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux, Yuan Tan, Andreas Hindborg

This series aims to bring the feature set of the Rust null block driver on
par with that of the C null_blk driver.

There are quite a few changes from v1 in this version. I tried to capture
everything in the change log, but I might have missed something along
the way.

I have prepared a tree with all dependencies applied at [1].

Best regards,
Andreas Hindborg

[1] git https://git.kernel.org/pub/scm/linux/kernel/git/a.hindborg/linux.git rnull-v7.1-rc2

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
Changes in v2:
- Fix shift direction in transfer length calculation.
- Retry page preload after reacquiring locks.
- Fix a bug where badblocks did not correctly limit IO size.
- Close TOCTOU window in configfs power-check stores (Alice).
- Use `bool` for semantically-boolean module parameters (Alice).
- Take `NumaNode` instead of a raw `i32` as the home node argument to `TagSet` (Alice).
- Use `c_void`, `c_uint`, and `c_int` from the prelude in `hctx` private data support (Alice).
- Use `size_of` from the prelude in `Request` private data support (Alice).
- Return `Ok(())` from `new_request_data` instead of `pin_init::zeroed` (Alice).
- Add `// CAST:` annotations to casts (Alice).
- Expand the comment on the `BLK_STS_.*` bindgen blocklist entry.
- Depend on "rust: module_param: return copy from value() for Copy types"
- block: rust: introduce `kernel::block::bio` module (Alice):
  - Use `kernel::fmt::Display` for `Bio` and cache `raw_iter()`.
  - Mark the `bio_advance_iter_single` helper `__rust_helper`.
  - Use a `srctree/` link for the C header.
  - Remove the stale reference-counting invariants from `Bio`.
  - Take `Pin<&mut Self>` in `Bio::segment_iter` and `Request::bio_mut`.
  - Document that the `bvec_iter` cursor can be copied and moved freely.
  - Use `&raw mut` instead of `core::ptr::from_mut`.
- Narrow the `unsafe` block in `Request::command()` using `BitAnd` (Alice, Gary).
- Use `c_void` from the prelude and drop a spurious blank line in the `TagSet` flags module (Alice).
- Drop the `Tree` type alias in favor of `XArray<TreeNode>` in rnull (Alice).
- Use a `NoIo` memory allocation scope in `queue_rq` rather than passing `GFP_NOIO`.
- Add the missing comma between `memory_backed` and `submit_queues` in the configfs feature listing (Alice).
- Fix the `use_per_node_hctx` store to set `submit_queues` to the online node count instead of multiplying by it (Alice).
- Use `static_assert!` instead of a `build_assert!` constant for the page/sector width check (Alice).
- Fix a typo in the `TagSet::new` doc comment (Ken).
- block: rust: add `BadBlocks` for bad block tracking (Alice):
  - Remove newline after `use` statements.
  - Add C header link.
  - Convert boolean to int with `into`.
  - Remove duplicated docs from `enabled`.
  - Use if/else rather than `then_some` in `set_bad`.
- Add a patch to rename `SECTOR_MASK` to `PAGE_SECTOR_MASK`.
- Use `pr_warn_once!` where applicable.
- Require `TagSet` private data to be `Send` for `TagSet` to be `Send`.
- Require `Operations::TagSetData: Sync`.
- Require `Operations::HwData: Send + Sync` and add a note on the bounds.
- Require `Operations::RequestData: Send` and add note on the bound.
- Add `TagSet::flags` to obtain flags and fix a bug in zoned emulation caused by taking a mutex under rcu read lock.
- Link to v1: https://msgid.link/20260216-rnull-v6-19-rc5-send-v1-0-de9a7af4b469@kernel.org

Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: rust-for-linux@vger.kernel.org
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: "Liam R. Howlett" <liam@infradead.org>
To: Alice Ryhl <aliceryhl@google.com>
To: Andreas Hindborg <a.hindborg@kernel.org>
To: Anna-Maria Behnsen <anna-maria@linutronix.de>
To: Benno Lossin <lossin@kernel.org>
To: Björn Roy Baron <bjorn3_gh@protonmail.com>
To: Boqun Feng <boqun.feng@gmail.com>
To: Boqun Feng <boqun@kernel.org>
To: Danilo Krummrich <dakr@kernel.org>
To: FUJITA Tomonori <fujita.tomonori@gmail.com>
To: Frederic Weisbecker <frederic@kernel.org>
To: Gary Guo <gary@garyguo.net>
To: Jens Axboe <axboe@kernel.dk>
To: John Stultz <jstultz@google.com>
To: Lorenzo Stoakes <ljs@kernel.org>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Lyude Paul <lyude@redhat.com>
To: Miguel Ojeda <ojeda@kernel.org>
To: Stephen Boyd <sboyd@kernel.org>
To: Thomas Gleixner <tglx@kernel.org>
To: Trevor Gross <tmgross@umich.edu>

---
Andreas Hindborg (83):
      block: rust: fix `Send` bound for `GenDisk`
      rust: block: rename `SECTOR_MASK` to `PAGE_SECTOR_MASK`
      block: rnull: adopt new formatting guidelines
      block: rnull: add module parameters
      block: rnull: add macros to define configfs attributes
      block: rust: fix generation of bindings to `BLK_STS_.*`
      block: rust: change `queue_rq` request type to `Owned`
      block: rust: add `Request` private data support
      block: rust: document the lifetime of `Request`
      block: rust: allow `hrtimer::Timer` in `RequestData`
      block: rnull: add timer completion mode
      block: rust: introduce `kernel::block::bio` module
      block: rust: add `command` getter to `Request`
      block: rust: mq: use GFP_KERNEL from prelude
      block: rust: add `TagSet` flags
      block: rnull: add memory backing
      block: rnull: add submit queue count config option
      block: rnull: add `use_per_node_hctx` config option
      block: rust: allow specifying home node when constructing `TagSet`
      block: rnull: allow specifying the home numa node
      block: rust: add Request::sectors() method
      block: rust: mq: add max_hw_discard_sectors support to GenDiskBuilder
      block: rnull: add discard support
      block: rust: add `NoDefaultScheduler` flag for `TagSet`
      block: rnull: add no_sched module parameter and configfs attribute
      block: rust: change sector type from usize to u64
      block: rust: add `BadBlocks` for bad block tracking
      block: rust: mq: add Request::end() method for custom status codes
      block: rnull: add badblocks support
      block: rnull: add badblocks_once support
      block: rust: add `Segment::truncate`
      block: rnull: add partial I/O support for bad blocks
      block: rust: add `TagSet` private data support
      block: rust: add `hctx` private data support
      block: rnull: add volatile cache emulation
      block: rust: implement `Sync` for `GenDisk`.
      block: rust: add a back reference feature to `GenDisk`
      block: rust: introduce an idle type state for `Request`
      block: rust: add a request queue abstraction
      block: rust: add a method to get the request queue for a request
      block: rust: introduce `kernel::block::error`
      block: rust: require `queue_rq` to return a `BlkResult`
      block: rust: add `GenDisk::queue_data`
      block: rnull: add bandwidth limiting
      block: rnull: add blocking queue mode
      block: rnull: add shared tags
      block: rnull: add queue depth config option
      block: rust: add an abstraction for `bindings::req_op`
      block: rust: add a method to set the target sector of a request
      block: rust: move gendisk vtable construction to separate function
      block: rust: add zoned block device support
      block: rust: add `TagSet::flags`
      block: rnull: add zoned storage support
      block: rust: add `map_queues` support
      block: rust: add an abstraction for `struct blk_mq_queue_map`
      block: rust: add polled completion support
      block: rust: add accessors to `TagSet`
      block: rnull: add polled completion support
      block: rnull: add REQ_OP_FLUSH support
      block: rust: add request flags abstraction
      block: rust: add abstraction for block queue feature flags
      block: rust: allow setting write cache and FUA flags for `GenDisk`
      block: rust: add `Segment::copy_to_page_limit`
      block: rnull: add fua support
      block: rust: add `GenDisk::tag_set`
      block: rust: add `TagSet::update_hw_queue_count`
      block: rnull: add an option to change the number of hardware queues
      block: rust: add an abstraction for `struct rq_list`
      block: rust: add `queue_rqs` vtable hook
      block: rnull: support queue_rqs
      block: rust: remove the `is_poll` parameter from `queue_rq`
      block: rust: add a debug assert for refcounts
      block: rust: add `TagSet::tag_to_rq`
      block: rust: add `Request::queue_index`
      block: rust: add `Request::requeue`
      block: rust: add `request_timeout` hook
      block: rnull: add fault injection support
      block: rust: add max_sectors option to `GenDiskBuilder`
      block: rnull: allow configuration of the maximum IO size
      block: rust: add `virt_boundary_mask` option to `GenDiskBuilder`
      block: rnull: add `virt_boundary` option
      block: rnull: add `shared_tag_bitmap` config option
      block: rnull: add zone offline and readonly configfs files

 drivers/block/rnull/Kconfig              |   11 +
 drivers/block/rnull/configfs.rs          |  605 +++++++++++++--
 drivers/block/rnull/configfs/macros.rs   |  143 ++++
 drivers/block/rnull/disk_storage.rs      |  326 ++++++++
 drivers/block/rnull/disk_storage/page.rs |   78 ++
 drivers/block/rnull/rnull.rs             | 1198 ++++++++++++++++++++++++++++--
 drivers/block/rnull/util.rs              |   65 ++
 drivers/block/rnull/zoned.rs             |  696 +++++++++++++++++
 rust/bindgen_parameters                  |    6 +
 rust/bindings/bindings_helper.h          |   55 ++
 rust/helpers/blk.c                       |   47 ++
 rust/kernel/block.rs                     |  101 ++-
 rust/kernel/block/badblocks.rs           |  716 ++++++++++++++++++
 rust/kernel/block/bio.rs                 |  147 ++++
 rust/kernel/block/bio/vec.rs             |  448 +++++++++++
 rust/kernel/block/mq.rs                  |   78 +-
 rust/kernel/block/mq/feature.rs          |   76 ++
 rust/kernel/block/mq/gen_disk.rs         |  336 +++++++--
 rust/kernel/block/mq/operations.rs       |  489 +++++++++++-
 rust/kernel/block/mq/request.rs          |  677 ++++++++++++++---
 rust/kernel/block/mq/request/command.rs  |   65 ++
 rust/kernel/block/mq/request/flag.rs     |   65 ++
 rust/kernel/block/mq/request_list.rs     |  119 +++
 rust/kernel/block/mq/request_queue.rs    |   60 ++
 rust/kernel/block/mq/tag_set.rs          |  299 +++++++-
 rust/kernel/block/mq/tag_set/flags.rs    |   29 +
 rust/kernel/error.rs                     |    3 +-
 rust/kernel/page.rs                      |    2 +-
 rust/kernel/time/hrtimer.rs              |    5 +-
 29 files changed, 6603 insertions(+), 342 deletions(-)
---
base-commit: 9e0898f1c0f134c6bad146ca8578f73c3e40ac0a
change-id: 20260215-rnull-v6-19-rc5-send-98c33ec692d6
prerequisite-change-id: 20250305-unique-ref-29fcd675f9e9:v17
prerequisite-patch-id: 6c6a7fdd56627293ec3bba61c495f16a0858700c
prerequisite-patch-id: c1958590235ee32d6ddb31ea168105bd9cf248f2
prerequisite-patch-id: c5a4b231dc8adf37e93ebdce308dacbe6a244bf3
prerequisite-patch-id: 541dba7938ba874f8d17fee05a36b1cd9fa2c4d7
prerequisite-patch-id: 3668fd640e4c411bae0c8ea9d986c3fa5d3c9e82
prerequisite-patch-id: da1274864841e267697be9529a50531126c64872
prerequisite-patch-id: c1463b6578e94b56d2bad41f6e614b5286fb1db3
prerequisite-patch-id: a31185fe1abbf553377d6d695c5d206eebc84358
prerequisite-patch-id: 4f392b5736e55a354ec3022644389f89b52fda42
prerequisite-patch-id: b6388ff0ebdd54610010d72a5398842a3c668bbf
prerequisite-change-id: 20251203-xarray-entry-send-00230f0744e6:v4
prerequisite-patch-id: 5d797523ed1bb94597570b6faa4cacea8d94b4f7
prerequisite-patch-id: f82bffce83d85ad4dd0bc9dab876e31c4500d467
prerequisite-patch-id: bc00e3c0a3694d8d490c782bc24b2a5786350da7
prerequisite-patch-id: 39c26c865ad383b133a742e5998e2b1f54999908
prerequisite-patch-id: 4082a1ae45104c2f3170197e186d83db552f9302
prerequisite-patch-id: de0c55224727e169d151d68a5316f0ae4549e4b8
prerequisite-patch-id: 57c6d2464a380542b5283817666540d2c97b0b61
prerequisite-patch-id: c788013f9319aa91f51f74f92f43cf7f2c04496f
prerequisite-patch-id: 959c962400d8595cc55b4f1b3a5501c2290a7d0e
prerequisite-patch-id: 66ed5c6a31fe2d775b5bc70774e3148fa3d860e5
prerequisite-patch-id: 869aa913843e11b467890ed35a1455458dbf3de4
prerequisite-change-id: 20260206-xarray-lockdep-fix-10f1cc68e5d7:v2
prerequisite-patch-id: e871db17a721fede1b7419b8236229190449885b
prerequisite-change-id: 20260130-page-volatile-io-05ff595507d3:v4
prerequisite-patch-id: 09224764d69c35c18e6fec846d4b7ba33c0e9cac
prerequisite-patch-id: cfd909257db3f5811c94d52ac2fc31cf220560c3
prerequisite-change-id: 20260128-gfp-noio-fbd41e135088:v2
prerequisite-patch-id: 420a09fdd0f2758f4d46228f99f29ff82f2d05f3
prerequisite-change-id: 20260212-impl-flags-inner-c61974b27b18:v2
prerequisite-patch-id: 379fb78c07b554278fae3c42d84d62bcfcfa0d45
prerequisite-change-id: 20260214-pin-slice-init-e8ef96fc07b9:v2
prerequisite-patch-id: cdf4e4b2b8c43bcb54b3ddf13a02e28c0e11e9ce
prerequisite-change-id: 20260215-page-additions-bc36046e9ffd:v2
prerequisite-patch-id: 6c6a7fdd56627293ec3bba61c495f16a0858700c
prerequisite-patch-id: c1958590235ee32d6ddb31ea168105bd9cf248f2
prerequisite-patch-id: c5a4b231dc8adf37e93ebdce308dacbe6a244bf3
prerequisite-patch-id: 541dba7938ba874f8d17fee05a36b1cd9fa2c4d7
prerequisite-patch-id: 3668fd640e4c411bae0c8ea9d986c3fa5d3c9e82
prerequisite-patch-id: da1274864841e267697be9529a50531126c64872
prerequisite-patch-id: c1463b6578e94b56d2bad41f6e614b5286fb1db3
prerequisite-patch-id: a31185fe1abbf553377d6d695c5d206eebc84358
prerequisite-patch-id: 4f392b5736e55a354ec3022644389f89b52fda42
prerequisite-patch-id: b6388ff0ebdd54610010d72a5398842a3c668bbf
prerequisite-patch-id: 1f57b529c53f4a650cbeeb7c1ff81653cb95e7f3
prerequisite-patch-id: 4d71a95c2d1a6a36339a9feda6296c33ec86f258
prerequisite-change-id: 20260215-cpu-helpers-08efb2572487:v2
prerequisite-patch-id: fd7f24bed247075d1946f9f526390772afb45236
prerequisite-patch-id: 7d243f4cd29a08a1eb2ca0e0e976fa82f0760f11
prerequisite-change-id: 20260215-export-do-unlocked-00a6ac9373d4:v2
prerequisite-patch-id: c65f4a3078f1acc1b77ea28b531e54664187dbce
prerequisite-change-id: 20260215-impl-flags-additions-0340ffcba5b9:v2
prerequisite-patch-id: 379fb78c07b554278fae3c42d84d62bcfcfa0d45
prerequisite-patch-id: 04c7db66a06be7a2566a23328d2c485ce24f1bb8
prerequisite-patch-id: 4d78d6d7aae15c51e6a1df2cb393392fb7ea90de
prerequisite-change-id: 20260215-ringbuffer-42455964aaf2:v2
prerequisite-patch-id: 44924a030c52ae111983078f1225510e9dc0c009
prerequisite-change-id: 20260215-configfs-c-default-groups-bdb0a44633a6:v2
prerequisite-patch-id: 03b8e71b79be89a73946f3c1f7248671c28ccd42
prerequisite-change-id: 20260215-unique-arc-as-ptr-32eb209dde1b:v2
prerequisite-patch-id: 20f44fe6cfe6b0e52b614bd64469fbf1df5c1e94
prerequisite-change-id: 20260215-rust-fault-inject-bc62f1083502:v2
prerequisite-patch-id: 03b8e71b79be89a73946f3c1f7248671c28ccd42
prerequisite-patch-id: 8b287be6364945d10e661e0828ad17b023f487e1
prerequisite-change-id: 20260215-hrtimer-active-f183411fe56b:v2
prerequisite-patch-id: e029dd2cb097192e597417e40d7d23bedaa79370
prerequisite-change-id: 20260529-modules-value-ref-e95a7ab94fdb:v2
prerequisite-patch-id: 618f9f3cfea3f8a03db5e73229d77b48f6549ab4
prerequisite-message-id: <20260411130254.3510128-2-wenzhaoliao@ruc.edu.cn>
prerequisite-patch-id: f714b166f93e453dddd01ed17c976b53e6da4957
prerequisite-change-id: 20260608-queue-data-sync-80b66ab312ac:v1
prerequisite-patch-id: ec86c4ec1531441a2c19085bf24ecc06819d7420
prerequisite-change-id: 20260608-update-hw-nodes-arg-940ecec0380a:v1
prerequisite-patch-id: a1e95b0ec36bf18976553fb8a2e17fd1527a6a1a
prerequisite-change-id: 20260608-configfs-fix-offset-6b3117158901:v1
prerequisite-patch-id: e8355bdd4444f8bda2663aa0bdcf3336de126255
prerequisite-change-id: 20260608-numa-node-id-85de708d4e8d:v1
prerequisite-patch-id: 8b82a179a91cd3e0ca8396eff81dae7bf66e5349

Best regards,
--  
Andreas Hindborg <a.hindborg@kernel.org>



^ permalink raw reply

* [PATCH v2 61/83] block: rust: add abstraction for block queue feature flags
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

Add the `Feature` enum and `Features` type as Rust abstractions for the
C `blk_features_t` bitfield. These types wrap the `BLK_FEAT_*` flags
and allow drivers to describe block device capabilities such as write
cache support, FUA, rotational media, and DAX.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 rust/bindings/bindings_helper.h | 15 +++++++-
 rust/kernel/block/mq.rs         |  5 +++
 rust/kernel/block/mq/feature.rs | 76 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 7acda3ae9725..af0330b9e491 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -119,7 +119,6 @@ const gfp_t RUST_CONST_HELPER_GFP_NOWAIT = GFP_NOWAIT;
 const gfp_t RUST_CONST_HELPER___GFP_ZERO = __GFP_ZERO;
 const gfp_t RUST_CONST_HELPER___GFP_HIGHMEM = ___GFP_HIGHMEM;
 const gfp_t RUST_CONST_HELPER___GFP_NOWARN = ___GFP_NOWARN;
-const blk_features_t RUST_CONST_HELPER_BLK_FEAT_ROTATIONAL = BLK_FEAT_ROTATIONAL;
 const blk_status_t RUST_CONST_HELPER_BLK_STS_OK = BLK_STS_OK;
 const blk_status_t RUST_CONST_HELPER_BLK_STS_NOTSUPP = BLK_STS_NOTSUPP;
 const blk_status_t RUST_CONST_HELPER_BLK_STS_TIMEOUT = BLK_STS_TIMEOUT;
@@ -139,7 +138,21 @@ const blk_status_t RUST_CONST_HELPER_BLK_STS_ZONE_ACTIVE_RESOURCE = BLK_STS_ZONE
 const blk_status_t RUST_CONST_HELPER_BLK_STS_OFFLINE = BLK_STS_OFFLINE;
 const blk_status_t RUST_CONST_HELPER_BLK_STS_DURATION_LIMIT = BLK_STS_DURATION_LIMIT;
 const blk_status_t RUST_CONST_HELPER_BLK_STS_INVAL = BLK_STS_INVAL;
+const blk_features_t RUST_CONST_HELPER_BLK_FEAT_WRITE_CACHE = BLK_FEAT_WRITE_CACHE;
+const blk_features_t RUST_CONST_HELPER_BLK_FEAT_FUA = BLK_FEAT_FUA;
+const blk_features_t RUST_CONST_HELPER_BLK_FEAT_ROTATIONAL = BLK_FEAT_ROTATIONAL;
+const blk_features_t RUST_CONST_HELPER_BLK_FEAT_ADD_RANDOM = BLK_FEAT_ADD_RANDOM;
+const blk_features_t RUST_CONST_HELPER_BLK_FEAT_IO_STAT = BLK_FEAT_IO_STAT;
+const blk_features_t RUST_CONST_HELPER_BLK_FEAT_STABLE_WRITES = BLK_FEAT_STABLE_WRITES;
+const blk_features_t RUST_CONST_HELPER_BLK_FEAT_SYNCHRONOUS = BLK_FEAT_SYNCHRONOUS;
+const blk_features_t RUST_CONST_HELPER_BLK_FEAT_NOWAIT = BLK_FEAT_NOWAIT;
+const blk_features_t RUST_CONST_HELPER_BLK_FEAT_DAX = BLK_FEAT_DAX;
+const blk_features_t RUST_CONST_HELPER_BLK_FEAT_POLL = BLK_FEAT_POLL;
 const blk_features_t RUST_CONST_HELPER_BLK_FEAT_ZONED = BLK_FEAT_ZONED;
+const blk_features_t RUST_CONST_HELPER_BLK_FEAT_PCI_P2PDMA = BLK_FEAT_PCI_P2PDMA;
+const blk_features_t RUST_CONST_HELPER_BLK_FEAT_SKIP_TAGSET_QUIESCE = BLK_FEAT_SKIP_TAGSET_QUIESCE;
+const blk_features_t RUST_CONST_HELPER_BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE = BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE;
+const blk_features_t RUST_CONST_HELPER_BLK_FEAT_ATOMIC_WRITES = BLK_FEAT_ATOMIC_WRITES;
 const blk_opf_t RUST_CONST_HELPER_REQ_FAILFAST_DEV = REQ_FAILFAST_DEV;
 const blk_opf_t RUST_CONST_HELPER_REQ_FAILFAST_TRANSPORT = REQ_FAILFAST_TRANSPORT;
 const blk_opf_t RUST_CONST_HELPER_REQ_FAILFAST_DRIVER = REQ_FAILFAST_DRIVER;
diff --git a/rust/kernel/block/mq.rs b/rust/kernel/block/mq.rs
index 9bad95d79230..7c346be843e1 100644
--- a/rust/kernel/block/mq.rs
+++ b/rust/kernel/block/mq.rs
@@ -125,12 +125,17 @@
 //! # Ok::<(), kernel::error::Error>(())
 //! ```
 
+mod feature;
 pub mod gen_disk;
 mod operations;
 mod request;
 mod request_queue;
 pub mod tag_set;
 
+pub use feature::{
+    Feature,
+    Features, //
+};
 pub use operations::{
     IoCompletionBatch,
     Operations, //
diff --git a/rust/kernel/block/mq/feature.rs b/rust/kernel/block/mq/feature.rs
new file mode 100644
index 000000000000..015d7925d5f0
--- /dev/null
+++ b/rust/kernel/block/mq/feature.rs
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Block device feature flags.
+//!
+//! This module provides Rust abstractions for the C `blk_features_t` type and
+//! the associated `BLK_FEAT_*` flags defined in `include/linux/blkdev.h`.
+
+use crate::{
+    bindings,
+    impl_flags, //
+};
+
+impl_flags! {
+    /// A set of block device feature flags.
+    ///
+    /// This type wraps the C `blk_features_t` bitfield and represents a
+    /// combination of zero or more [`Feature`] flags. It is used to describe
+    /// the capabilities of a block device in [`struct queue_limits`].
+    ///
+    /// [`struct queue_limits`]: srctree/include/linux/blkdev.h
+    #[derive(Debug, Clone, Default, Copy, PartialEq, Eq)]
+    pub struct Features(u32);
+
+    /// A block device feature flag.
+    ///
+    /// Each variant corresponds to a `BLK_FEAT_*` constant defined in
+    /// `include/linux/blkdev.h`. These flags describe individual capabilities
+    /// or properties of a block device.
+    #[derive(Debug, Clone, Copy, PartialEq, Eq)]
+    pub enum Feature {
+        /// Supports a volatile write cache.
+        WriteCache = bindings::BLK_FEAT_WRITE_CACHE,
+
+        /// Supports passing on the FUA bit.
+        ForcedUnitAccess = bindings::BLK_FEAT_FUA,
+
+        /// Rotational device (hard drive or floppy).
+        Rotational = bindings::BLK_FEAT_ROTATIONAL,
+
+        /// Contributes to the random number pool.
+        AddRandom = bindings::BLK_FEAT_ADD_RANDOM,
+
+        /// Enables disk/partitions I/O accounting.
+        IoStat = bindings::BLK_FEAT_IO_STAT,
+
+        /// Don't modify data until writeback is done.
+        StableWrites = bindings::BLK_FEAT_STABLE_WRITES,
+
+        /// Always completes in submit context.
+        Synchronous = bindings::BLK_FEAT_SYNCHRONOUS,
+
+        /// Supports REQ_NOWAIT.
+        Nowait = bindings::BLK_FEAT_NOWAIT,
+
+        /// Supports DAX.
+        Dax = bindings::BLK_FEAT_DAX,
+
+        /// Supports I/O polling.
+        Poll = bindings::BLK_FEAT_POLL,
+
+        /// Is a zoned device.
+        Zoned = bindings::BLK_FEAT_ZONED,
+
+        /// Supports PCI(e) p2p requests.
+        PciP2Pdma = bindings::BLK_FEAT_PCI_P2PDMA,
+
+        /// Skips this queue in `blk_mq_(un)quiesce_tagset`.
+        SkipTagsetQuiesce = bindings::BLK_FEAT_SKIP_TAGSET_QUIESCE,
+
+        /// Undocumented magic for bcache.
+        RaidPartialStripesExpensive = bindings::BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE,
+
+        /// Atomic writes enabled.
+        AtomicWrites = bindings::BLK_FEAT_ATOMIC_WRITES,
+    }
+}

-- 
2.51.2



^ permalink raw reply related

* [PATCH v2 53/83] block: rnull: add zoned storage support
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

Add zoned block device emulation to rnull. When enabled via the `zoned`
configfs attribute, the driver emulates a zoned storage device with
configurable zone size and zone count.

The implementation supports zone management operations including zone
reset, zone open, zone close, and zone finish. Zone write pointer
tracking is maintained for sequential write required zones.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 drivers/block/rnull/configfs.rs          |  67 +++-
 drivers/block/rnull/disk_storage.rs      |  34 +-
 drivers/block/rnull/disk_storage/page.rs |   4 +-
 drivers/block/rnull/rnull.rs             | 243 +++++++----
 drivers/block/rnull/util.rs              |  65 +++
 drivers/block/rnull/zoned.rs             | 663 +++++++++++++++++++++++++++++++
 6 files changed, 973 insertions(+), 103 deletions(-)

diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index 8fa16dbc2a75..f866595a263c 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -80,7 +80,8 @@ impl AttributeOperations<0> for Config {
         let mut writer = kernel::str::Formatter::new(page);
         writer.write_str(
             "blocksize,size,rotational,irqmode,completion_nsec,memory_backed,\
-             submit_queues,use_per_node_hctx,discard,blocking,shared_tags\n",
+             submit_queues,use_per_node_hctx,discard,blocking,shared_tags,\
+             zoned,zone_size,zone_capacity\n",
         )?;
         Ok(writer.bytes_written())
     }
@@ -118,7 +119,14 @@ fn make_group(
                 mbps: 16,
                 blocking: 17,
                 shared_tags: 18,
-                hw_queue_depth: 19
+                hw_queue_depth: 19,
+                zoned: 20,
+                zone_size: 21,
+                zone_capacity: 22,
+                zone_nr_conv: 23,
+                zone_max_open: 24,
+                zone_max_active: 25,
+                zone_append_max_sectors: 26,
             ],
         };
 
@@ -145,16 +153,20 @@ fn make_group(
                     bad_blocks: Arc::pin_init(BadBlocks::new(false), GFP_KERNEL)?,
                     bad_blocks_once: false,
                     bad_blocks_partial_io: false,
-                    disk_storage: Arc::pin_init(
-                        DiskStorage::new(0, block_size as usize),
-                        GFP_KERNEL
-                    )?,
+                    disk_storage: Arc::pin_init(DiskStorage::new(0, block_size), GFP_KERNEL)?,
                     cache_size_mib: 0,
                     mbps: 0,
                     blocking: false,
                     shared_tags: false,
                     shared_tag_set: self.shared_tag_set.clone(),
                     hw_queue_depth: 64,
+                    zoned: false,
+                    zone_size_mib: 256,
+                    zone_capacity_mib: 0,
+                    zone_nr_conv: 0,
+                    zone_max_open: 0,
+                    zone_max_active: 0,
+                    zone_append_max_sectors: u32::MAX,
                 }),
             }),
             core::iter::empty(),
@@ -234,6 +246,13 @@ struct DeviceConfigInner {
     shared_tags: bool,
     shared_tag_set: Arc<TagSet<NullBlkDevice>>,
     hw_queue_depth: u32,
+    zoned: bool,
+    zone_size_mib: u32,
+    zone_capacity_mib: u32,
+    zone_nr_conv: u32,
+    zone_max_open: u32,
+    zone_max_active: u32,
+    zone_append_max_sectors: u32,
 }
 
 #[vtable]
@@ -257,11 +276,24 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
         let mut guard = this.data.lock();
 
         if !guard.powered && power_op {
+            // We protect zone state with a mutex, so we require blocking queues for zone emulation.
+            if guard.shared_tags && guard.zoned {
+                if !guard
+                    .shared_tag_set
+                    .flags()
+                    .contains(kernel::block::mq::tag_set::Flag::Blocking)
+                {
+                    return Err(EINVAL);
+                }
+            } else if guard.zoned && !guard.blocking {
+                return Err(EINVAL);
+            }
+
             guard.disk = Some(NullBlkDevice::new(crate::NullBlkOptions {
                 name: &guard.name,
-                block_size: guard.block_size,
+                block_size_bytes: guard.block_size,
                 rotational: guard.rotational,
-                capacity_mib: guard.capacity_mib,
+                device_capacity_mib: guard.capacity_mib,
                 irq_mode: guard.irq_mode,
                 completion_time: guard.completion_time,
                 discard: guard.discard,
@@ -279,6 +311,13 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
                     no_sched: guard.no_sched,
                     hw_queue_depth: guard.hw_queue_depth,
                 },
+                zoned: guard.zoned,
+                zone_size_mib: guard.zone_size_mib,
+                zone_capacity_mib: guard.zone_capacity_mib,
+                zone_nr_conv: guard.zone_nr_conv,
+                zone_max_open: guard.zone_max_open,
+                zone_max_active: guard.zone_max_active,
+                zone_append_max_sectors: guard.zone_append_max_sectors,
             })?);
             guard.powered = true;
         } else if guard.powered && !power_op {
@@ -442,10 +481,7 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
     store: |this, page| store_with_power_check(this, page, |data, page| {
         let text = core::str::from_utf8(page)?.trim();
         let value = text.parse::<u64>().map_err(|_| EINVAL)?;
-        data.disk_storage = Arc::pin_init(
-            DiskStorage::new(value, data.block_size as usize),
-            GFP_KERNEL
-        )?;
+        data.disk_storage = Arc::pin_init(DiskStorage::new(value, data.block_size), GFP_KERNEL)?;
         data.cache_size_mib = value;
         Ok(())
     })
@@ -455,3 +491,10 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
 configfs_simple_bool_field!(DeviceConfig, 17, blocking);
 configfs_simple_bool_field!(DeviceConfig, 18, shared_tags);
 configfs_simple_field!(DeviceConfig, 19, hw_queue_depth, u32);
+configfs_simple_bool_field!(DeviceConfig, 20, zoned);
+configfs_simple_field!(DeviceConfig, 21, zone_size_mib, u32);
+configfs_simple_field!(DeviceConfig, 22, zone_capacity_mib, u32);
+configfs_simple_field!(DeviceConfig, 23, zone_nr_conv, u32);
+configfs_simple_field!(DeviceConfig, 24, zone_max_open, u32);
+configfs_simple_field!(DeviceConfig, 25, zone_max_active, u32);
+configfs_simple_field!(DeviceConfig, 26, zone_append_max_sectors, u32);
diff --git a/drivers/block/rnull/disk_storage.rs b/drivers/block/rnull/disk_storage.rs
index b8fef411fffe..82de1f656f68 100644
--- a/drivers/block/rnull/disk_storage.rs
+++ b/drivers/block/rnull/disk_storage.rs
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
 use super::HwQueueContext;
+use crate::util::*;
 use core::pin::Pin;
 use kernel::{
     block,
@@ -9,8 +10,12 @@
     page::PAGE_SIZE,
     prelude::*,
     sync::{
-        atomic::{ordering, Atomic},
-        SpinLock, SpinLockGuard,
+        atomic::{
+            ordering,
+            Atomic, //
+        },
+        SpinLock,
+        SpinLockGuard, //
     },
     uapi::PAGE_SECTORS,
     xarray::{
@@ -31,11 +36,11 @@ pub(crate) struct DiskStorage {
     cache_size: u64,
     cache_size_used: Atomic<u64>,
     next_flush_sector: Atomic<u64>,
-    block_size: usize,
+    block_size: u32,
 }
 
 impl DiskStorage {
-    pub(crate) fn new(cache_size: u64, block_size: usize) -> impl PinInit<Self, Error> {
+    pub(crate) fn new(cache_size: u64, block_size: u32) -> impl PinInit<Self, Error> {
         try_pin_init!( Self {
             // TODO: Get rid of the box
             // https://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git/commit/?h=locking&id=a5d84cafb3e253a11d2e078902c5b090be2f4227
@@ -59,6 +64,27 @@ pub(crate) fn access<'a, 'b, 'c>(
     pub(crate) fn lock(&self) -> SpinLockGuard<'_, Pin<KBox<TreeContainer>>> {
         self.trees.lock()
     }
+
+    pub(crate) fn discard(
+        &self,
+        hw_data: &Pin<&SpinLock<HwQueueContext>>,
+        mut sector: u64,
+        sectors: u32,
+    ) {
+        let mut tree_guard = self.lock();
+        let mut hw_data_guard = hw_data.lock();
+
+        let mut access = self.access(&mut tree_guard, &mut hw_data_guard, None);
+
+        let mut remaining_bytes = sectors_to_bytes(sectors);
+
+        while remaining_bytes > 0 {
+            access.free_sector(sector);
+            let processed = remaining_bytes.min(self.block_size);
+            sector += Into::<u64>::into(bytes_to_sectors(processed));
+            remaining_bytes -= processed;
+        }
+    }
 }
 
 pub(crate) struct DiskStorageAccess<'a, 'b, 'c> {
diff --git a/drivers/block/rnull/disk_storage/page.rs b/drivers/block/rnull/disk_storage/page.rs
index bc78973ad5d4..88dc9a2476bd 100644
--- a/drivers/block/rnull/disk_storage/page.rs
+++ b/drivers/block/rnull/disk_storage/page.rs
@@ -20,11 +20,11 @@
 pub(crate) struct NullBlockPage {
     page: Owned<SafePage>,
     status: u64,
-    block_size: usize,
+    block_size: u32,
 }
 
 impl NullBlockPage {
-    pub(crate) fn new(block_size: usize) -> Result<KBox<Self>> {
+    pub(crate) fn new(block_size: u32) -> Result<KBox<Self>> {
         memalloc_scope!(let _noio: NoIo);
         Ok(KBox::new(
             Self {
diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index 5ec17a2674b7..6fb307e33263 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -2,8 +2,13 @@
 
 //! This is a Rust implementation of the C null block driver.
 
+#![recursion_limit = "256"]
+
 mod configfs;
 mod disk_storage;
+mod util;
+#[cfg(CONFIG_BLK_DEV_ZONED)]
+mod zoned;
 
 use configfs::IRQMode;
 use disk_storage::{
@@ -77,6 +82,7 @@
     },
     xarray::XArraySheaf, //
 };
+use util::*;
 
 module! {
     type: NullBlkModule,
@@ -151,6 +157,35 @@
             default: 64,
             description:  "Queue depth for each hardware queue. Default: 64",
         },
+        zoned: bool {
+            default: false,
+            description: "Make device as a host-managed zoned block device.",
+        },
+        zone_size: u32 {
+            default: 256,
+            description:
+            "Zone size in MB when block device is zoned. Must be power-of-two: Default: 256",
+        },
+        zone_capacity: u32 {
+            default: 0,
+            description: "Zone capacity in MB when block device is zoned. Can be less than or equal to zone size. Default: Zone size",
+        },
+        zone_nr_conv: u32 {
+            default: 0,
+            description: "Number of conventional zones when block device is zoned. Default: 0",
+        },
+        zone_max_open: u32 {
+            default: 0,
+            description: "Maximum number of open zones when block device is zoned. Default: 0 (no limit)",
+        },
+        zone_max_active: u32 {
+            default: 0,
+            description: "Maximum number of active zones when block device is zoned. Default: 0 (no limit)",
+        },
+        zone_append_max_sectors: u32 {
+            default: 0,
+            description: "Maximum size of a zone append command (in 512B sectors). Specify 0 for no zone append.",
+        },
     },
 }
 
@@ -195,16 +230,16 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
                 let block_size = module_parameters::bs.value();
                 let disk = NullBlkDevice::new(NullBlkOptions {
                     name: &name,
-                    block_size,
+                    block_size_bytes: block_size,
                     rotational: module_parameters::rotational.value(),
-                    capacity_mib: module_parameters::gb.value() * 1024,
+                    device_capacity_mib: module_parameters::gb.value() * 1024,
                     irq_mode: module_parameters::irqmode.value().try_into()?,
                     completion_time: Delta::from_nanos(completion_time),
                     discard: module_parameters::discard.value(),
                     bad_blocks: Arc::pin_init(BadBlocks::new(false), GFP_KERNEL)?,
                     bad_blocks_once: false,
                     bad_blocks_partial_io: false,
-                    storage: Arc::pin_init(DiskStorage::new(0, block_size as usize), GFP_KERNEL)?,
+                    storage: Arc::pin_init(DiskStorage::new(0, block_size), GFP_KERNEL)?,
                     bandwidth_limit: u64::from(module_parameters::mbps.value()) * 2u64.pow(20),
                     shared_tag_set: module_parameters::shared_tags
                         .value()
@@ -217,6 +252,13 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
                         no_sched,
                         hw_queue_depth,
                     },
+                    zoned: module_parameters::zoned.value(),
+                    zone_size_mib: module_parameters::zone_size.value(),
+                    zone_capacity_mib: module_parameters::zone_capacity.value(),
+                    zone_nr_conv: module_parameters::zone_nr_conv.value(),
+                    zone_max_open: module_parameters::zone_max_open.value(),
+                    zone_max_active: module_parameters::zone_max_active.value(),
+                    zone_append_max_sectors: module_parameters::zone_append_max_sectors.value(),
                 })?;
                 disks.push(disk, GFP_KERNEL)?;
             }
@@ -231,9 +273,9 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
 
 struct NullBlkOptions<'a> {
     name: &'a CStr,
-    block_size: u32,
+    block_size_bytes: u32,
     rotational: bool,
-    capacity_mib: u64,
+    device_capacity_mib: u64,
     irq_mode: IRQMode,
     completion_time: Delta,
     discard: bool,
@@ -244,6 +286,19 @@ struct NullBlkOptions<'a> {
     bandwidth_limit: u64,
     shared_tag_set: Option<Arc<TagSet<NullBlkDevice>>>,
     tag_set: TagSetOptions,
+    zoned: bool,
+    #[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(dead_code))]
+    zone_size_mib: u32,
+    #[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(dead_code))]
+    zone_capacity_mib: u32,
+    #[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(dead_code))]
+    zone_nr_conv: u32,
+    #[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(dead_code))]
+    zone_max_open: u32,
+    #[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(dead_code))]
+    zone_max_active: u32,
+    #[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(dead_code))]
+    zone_append_max_sectors: u32,
 }
 
 #[pin_data]
@@ -252,7 +307,7 @@ struct NullBlkDevice {
     irq_mode: IRQMode,
     completion_time: Delta,
     memory_backed: bool,
-    block_size: usize,
+    block_size_bytes: u32,
     bad_blocks: Arc<BadBlocks>,
     bad_blocks_once: bool,
     bad_blocks_partial_io: bool,
@@ -263,6 +318,9 @@ struct NullBlkDevice {
     #[pin]
     bandwidth_timer_handle: SpinLock<Option<ArcHrTimerHandle<Self>>>,
     disk: SetOnce<Arc<Revocable<GenDiskRef<Self>>>>,
+    #[cfg(CONFIG_BLK_DEV_ZONED)]
+    #[pin]
+    zoned: zoned::ZoneOptions,
 }
 
 struct TagSetOptions {
@@ -314,9 +372,9 @@ fn build_tag_set(options: TagSetOptions) -> Result<Arc<TagSet<Self>>> {
     fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
         let NullBlkOptions {
             name,
-            block_size,
+            block_size_bytes,
             rotational,
-            capacity_mib,
+            device_capacity_mib,
             irq_mode,
             completion_time,
             discard,
@@ -327,6 +385,19 @@ fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
             bandwidth_limit,
             shared_tag_set,
             tag_set,
+            zoned,
+            #[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(unused_variables))]
+            zone_size_mib,
+            #[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(unused_variables))]
+            zone_capacity_mib,
+            #[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(unused_variables))]
+            zone_nr_conv,
+            #[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(unused_variables))]
+            zone_max_open,
+            #[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(unused_variables))]
+            zone_max_active,
+            #[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(unused_variables))]
+            zone_append_max_sectors,
         } = options;
 
         let memory_backed = tag_set.memory_backed;
@@ -337,10 +408,10 @@ fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
             Self::build_tag_set(tag_set)?
         };
 
-        let capacity_sectors = capacity_mib << (20 - block::SECTOR_SHIFT);
+        let device_capacity_sectors = mib_to_sectors(device_capacity_mib);
 
         // Prevent overflow in usize/u64 casts
-        if usize::BITS == 32 && capacity_sectors > u32::MAX.into() {
+        if usize::BITS == 32 && device_capacity_sectors > u32::MAX.into() {
             return Err(code::EINVAL);
         }
 
@@ -350,7 +421,7 @@ fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
                 irq_mode,
                 completion_time,
                 memory_backed,
-                block_size: block_size as usize,
+                block_size_bytes,
                 bad_blocks,
                 bad_blocks_once,
                 bad_blocks_partial_io,
@@ -359,17 +430,42 @@ fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
                 bandwidth_bytes: Atomic::new(0),
                 bandwidth_timer_handle <- new_spinlock!(None),
                 disk: SetOnce::new(),
+                #[cfg(CONFIG_BLK_DEV_ZONED)]
+                zoned <- zoned::ZoneOptions::new(zoned::ZoneOptionsArgs {
+                    enable: zoned,
+                    device_capacity_mib,
+                    block_size_bytes: *block_size_bytes,
+                    zone_size_mib,
+                    zone_capacity_mib,
+                    zone_nr_conv,
+                    zone_max_open,
+                    zone_max_active,
+                    zone_append_max_sectors,
+                })?,
             }),
             GFP_KERNEL,
         )?;
 
         let mut builder = gen_disk::GenDiskBuilder::new()
-            .capacity_sectors(capacity_sectors)
-            .logical_block_size(block_size)?
-            .physical_block_size(block_size)?
+            .capacity_sectors(device_capacity_sectors)
+            .logical_block_size(block_size_bytes)?
+            .physical_block_size(block_size_bytes)?
             .rotational(rotational);
 
-        if memory_backed && discard {
+        #[cfg(CONFIG_BLK_DEV_ZONED)]
+        {
+            builder = builder
+                .zoned(zoned)
+                .zone_size(queue_data.zoned.size_sectors)
+                .zone_append_max(zone_append_max_sectors);
+        }
+
+        if !cfg!(CONFIG_BLK_DEV_ZONED) && zoned {
+            return Err(ENOTSUPP);
+        }
+
+        // TODO: Warn on invalid discard configuration (zoned, memory)
+        if memory_backed && discard && !zoned {
             builder = builder
                 // Max IO size is u32::MAX bytes
                 .max_hw_discard_sectors(ffi::c_uint::MAX >> block::SECTOR_SHIFT);
@@ -393,7 +489,7 @@ fn sheaf_size() -> usize {
     fn preload<'b, 'c>(
         tree_guard: &'b mut SpinLockGuard<'c, Pin<KBox<TreeContainer>>>,
         hw_data_guard: &'b mut SpinLockGuard<'c, HwQueueContext>,
-        block_size: usize,
+        block_size_bytes: u32,
         sheaf: &'b mut Option<XArraySheaf<'c>>,
     ) -> Result {
         match sheaf {
@@ -418,10 +514,9 @@ fn preload<'b, 'c>(
 
         // Another thread may get the lock after we allocate. If this happens, retry.
         while hw_data_guard.page.is_none() {
-            hw_data_guard.page =
-                Some(tree_guard.do_unlocked(|| {
-                    hw_data_guard.do_unlocked(|| NullBlockPage::new(block_size))
-                })?);
+            hw_data_guard.page = Some(tree_guard.do_unlocked(|| {
+                hw_data_guard.do_unlocked(|| NullBlockPage::new(block_size_bytes))
+            })?);
         }
 
         Ok(())
@@ -438,7 +533,7 @@ fn write<'a, 'b, 'c>(
         let mut sheaf: Option<XArraySheaf<'_>> = None;
 
         while !segment.is_empty() {
-            Self::preload(tree_guard, hw_data_guard, self.block_size, &mut sheaf)?;
+            Self::preload(tree_guard, hw_data_guard, self.block_size_bytes, &mut sheaf)?;
 
             let mut access = self.storage.access(tree_guard, hw_data_guard, sheaf);
 
@@ -491,48 +586,23 @@ fn read<'a, 'b, 'c>(
                         >> block::SECTOR_SHIFT;
                 }
                 // CAST: Casting from `usize` to `u64` never overflows.
-                None => sector += segment.zero_page() as u64 >> block::SECTOR_SHIFT,
+                None => sector += bytes_to_sectors(segment.zero_page() as u64),
             }
         }
 
         Ok(())
     }
 
-    fn discard(
-        &self,
-        hw_data: &Pin<&SpinLock<HwQueueContext>>,
-        mut sector: u64,
-        sectors: u32,
-    ) -> Result {
-        let mut tree_guard = self.storage.lock();
-        let mut hw_data_guard = hw_data.lock();
-
-        let mut access = self
-            .storage
-            .access(&mut tree_guard, &mut hw_data_guard, None);
-
-        let mut remaining_bytes = (sectors as usize) << SECTOR_SHIFT;
-
-        while remaining_bytes > 0 {
-            access.free_sector(sector);
-            let processed = remaining_bytes.min(self.block_size);
-            sector += (processed >> SECTOR_SHIFT) as u64;
-            remaining_bytes -= processed;
-        }
-
-        Ok(())
-    }
-
     #[inline(never)]
     fn transfer(
         &self,
         hw_data: &Pin<&SpinLock<HwQueueContext>>,
         rq: &mut Owned<mq::Request<Self>>,
+        command: mq::Command,
         max_sectors: u32,
     ) -> Result {
         let mut sector = rq.sector();
         let max_end_sector = sector + <u32 as Into<u64>>::into(max_sectors);
-        let command = rq.command();
 
         // TODO: Use `PerCpu` to get rid of this lock
         let mut hw_data_guard = hw_data.lock();
@@ -566,6 +636,27 @@ fn transfer(
         Ok(())
     }
 
+    fn handle_regular_command(
+        &self,
+        hw_data: &Pin<&SpinLock<HwQueueContext>>,
+        rq: &mut Owned<mq::Request<Self>>,
+    ) -> Result {
+        let mut sectors = rq.sectors();
+
+        self.handle_bad_blocks(rq, &mut sectors)?;
+
+        if self.memory_backed {
+            memalloc_scope!(let _noio: NoIo);
+            if rq.command() == mq::Command::Discard {
+                self.storage.discard(hw_data, rq.sector(), sectors);
+            } else {
+                self.transfer(hw_data, rq, rq.command(), sectors)?;
+            }
+        }
+
+        Ok(())
+    }
+
     fn handle_bad_blocks(&self, rq: &mut Owned<mq::Request<Self>>, sectors: &mut u32) -> Result {
         if self.bad_blocks.enabled() {
             let start = rq.sector();
@@ -581,7 +672,7 @@ fn handle_bad_blocks(&self, rq: &mut Owned<mq::Request<Self>>, sectors: &mut u32
                     }
 
                     if self.bad_blocks_partial_io {
-                        let block_size_sectors = (self.block_size >> SECTOR_SHIFT) as u64;
+                        let block_size_sectors = u64::from(bytes_to_sectors(self.block_size_bytes));
                         range.start = align_down(range.start, block_size_sectors);
                         if start < range.start {
                             *sectors = (range.start - start) as u32;
@@ -666,30 +757,6 @@ impl HasHrTimer<Self> for Pdu {
     }
 }
 
-fn is_power_of_two<T>(value: T) -> bool
-where
-    T: core::ops::Sub<T, Output = T>,
-    T: core::ops::BitAnd<Output = T>,
-    T: core::cmp::PartialOrd<T>,
-    T: Copy,
-    T: From<u8>,
-{
-    (value > 0u8.into()) && (value & (value - 1u8.into())) == 0u8.into()
-}
-
-fn align_down<T>(value: T, to: T) -> T
-where
-    T: core::ops::Sub<T, Output = T>,
-    T: core::ops::Not<Output = T>,
-    T: core::ops::BitAnd<Output = T>,
-    T: core::cmp::PartialOrd<T>,
-    T: Copy,
-    T: From<u8>,
-{
-    debug_assert!(is_power_of_two(to));
-    value & !(to - 1u8.into())
-}
-
 #[vtable]
 impl Operations for NullBlkDevice {
     type QueueData = Arc<Self>;
@@ -711,8 +778,6 @@ fn queue_rq(
         rq: Owned<mq::IdleRequest<Self>>,
         _is_last: bool,
     ) -> BlkResult {
-        let mut sectors = rq.sectors();
-
         if this.bandwidth_limit != 0 {
             if !this.bandwidth_timer.active() {
                 drop(this.bandwidth_timer_handle.lock().take());
@@ -738,18 +803,16 @@ fn queue_rq(
 
         let mut rq = rq.start();
 
-        use core::ops::Deref;
-        Self::handle_bad_blocks(this.deref(), &mut rq, &mut sectors)?;
-
-        if this.memory_backed {
-            memalloc_scope!(let _noio: NoIo);
-            if rq.command() == mq::Command::Discard {
-                this.discard(&hw_data, rq.sector(), sectors)?;
-            } else {
-                this.transfer(&hw_data, &mut rq, sectors)?;
-            }
+        #[cfg(CONFIG_BLK_DEV_ZONED)]
+        if this.zoned.enabled {
+            this.handle_zoned_command(&hw_data, &mut rq)?;
+        } else {
+            this.handle_regular_command(&hw_data, &mut rq)?;
         }
 
+        #[cfg(not(CONFIG_BLK_DEV_ZONED))]
+        this.handle_regular_command(&hw_data, &mut rq)?;
+
         match this.irq_mode {
             IRQMode::None => Self::end_request(rq),
             IRQMode::Soft => mq::Request::complete(rq.into()),
@@ -775,4 +838,14 @@ fn complete(rq: ARef<mq::Request<Self>>) {
                 .expect("Failed to complete request"),
         )
     }
+
+    #[cfg(CONFIG_BLK_DEV_ZONED)]
+    fn report_zones(
+        disk: &GenDiskRef<Self>,
+        sector: u64,
+        nr_zones: u32,
+        callback: impl Fn(&bindings::blk_zone, u32) -> Result,
+    ) -> Result<u32> {
+        Self::report_zones_internal(disk, sector, nr_zones, callback)
+    }
 }
diff --git a/drivers/block/rnull/util.rs b/drivers/block/rnull/util.rs
new file mode 100644
index 000000000000..044926c8e284
--- /dev/null
+++ b/drivers/block/rnull/util.rs
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Return true if `value` is a power of two.
+pub(crate) fn is_power_of_two<T>(value: T) -> bool
+where
+    T: core::ops::Sub<T, Output = T>,
+    T: core::ops::BitAnd<Output = T>,
+    T: core::cmp::PartialOrd<T>,
+    T: Copy,
+    T: From<u8>,
+{
+    (value > 0u8.into()) && (value & (value - 1u8.into())) == 0u8.into()
+}
+
+// Round `value` down to the next multiple of `to`, which must be a power of
+// two.
+pub(crate) fn align_down<T>(value: T, to: T) -> T
+where
+    T: core::ops::Sub<T, Output = T>,
+    T: core::ops::Not<Output = T>,
+    T: core::ops::BitAnd<Output = T>,
+    T: core::cmp::PartialOrd<T>,
+    T: Copy,
+    T: From<u8>,
+{
+    debug_assert!(is_power_of_two(to));
+    value & !(to - 1u8.into())
+}
+
+// Round `value` up to the next multiple of `to`, which must be a power of two.
+#[cfg(CONFIG_BLK_DEV_ZONED)]
+pub(crate) fn align_up<T>(value: T, to: T) -> T
+where
+    T: core::ops::Sub<T, Output = T>,
+    T: core::ops::Add<T, Output = T>,
+    T: core::ops::BitAnd<Output = T>,
+    T: core::ops::BitOr<Output = T>,
+    T: core::cmp::PartialOrd<T>,
+    T: Copy,
+    T: From<u8>,
+{
+    debug_assert!(is_power_of_two(to));
+    ((value - 1u8.into()) | (to - 1u8.into())) + 1u8.into()
+}
+
+pub(crate) fn mib_to_sectors<T>(mib: T) -> T
+where
+    T: core::ops::Shl<u32, Output = T>,
+{
+    mib << (20 - kernel::block::SECTOR_SHIFT)
+}
+
+pub(crate) fn sectors_to_bytes<T>(sectors: T) -> T
+where
+    T: core::ops::Shl<u32, Output = T>,
+{
+    sectors << kernel::block::SECTOR_SHIFT
+}
+
+pub(crate) fn bytes_to_sectors<T>(bytes: T) -> T
+where
+    T: core::ops::Shl<u32, Output = T>,
+{
+    bytes << kernel::block::SECTOR_SHIFT
+}
diff --git a/drivers/block/rnull/zoned.rs b/drivers/block/rnull/zoned.rs
new file mode 100644
index 000000000000..808449cc49e1
--- /dev/null
+++ b/drivers/block/rnull/zoned.rs
@@ -0,0 +1,663 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use crate::{
+    util::*,
+    HwQueueContext, //
+};
+use kernel::{
+    bindings,
+    block::mq::{
+        self,
+        gen_disk::GenDiskRef, //
+    },
+    memalloc_scope,
+    new_mutex,
+    new_spinlock,
+    prelude::*,
+    sync::Mutex,
+    sync::SpinLock,
+    types::Owned, //
+};
+
+pub(crate) struct ZoneOptionsArgs {
+    pub(crate) enable: bool,
+    pub(crate) device_capacity_mib: u64,
+    pub(crate) block_size_bytes: u32,
+    pub(crate) zone_size_mib: u32,
+    pub(crate) zone_capacity_mib: u32,
+    pub(crate) zone_nr_conv: u32,
+    pub(crate) zone_max_open: u32,
+    pub(crate) zone_max_active: u32,
+    pub(crate) zone_append_max_sectors: u32,
+}
+
+#[pin_data]
+pub(crate) struct ZoneOptions {
+    pub(crate) enabled: bool,
+    zones: Pin<KBox<[Mutex<ZoneDescriptor>]>>,
+    conventional_count: u32,
+    pub(crate) size_sectors: u32,
+    append_max_sectors: u32,
+    max_open: u32,
+    max_active: u32,
+    #[pin]
+    accounting: SpinLock<ZoneAccounting>,
+}
+
+impl ZoneOptions {
+    pub(crate) fn new(args: ZoneOptionsArgs) -> Result<impl PinInit<Self, Error>> {
+        let ZoneOptionsArgs {
+            enable,
+            device_capacity_mib,
+            block_size_bytes,
+            zone_size_mib,
+            zone_capacity_mib,
+            mut zone_nr_conv,
+            mut zone_max_open,
+            mut zone_max_active,
+            zone_append_max_sectors,
+        } = args;
+
+        if !is_power_of_two(zone_size_mib) {
+            return Err(EINVAL);
+        }
+
+        if zone_capacity_mib > zone_size_mib {
+            return Err(EINVAL);
+        }
+
+        let zone_size_sectors = mib_to_sectors(zone_size_mib);
+        let device_capacity_sectors = mib_to_sectors(device_capacity_mib);
+        let zone_capacity_sectors = mib_to_sectors(zone_capacity_mib);
+        let zone_count: u32 = (align_up(device_capacity_sectors, zone_size_sectors.into())
+            >> zone_size_sectors.ilog2())
+        .try_into()?;
+
+        if zone_nr_conv >= zone_count {
+            zone_nr_conv = zone_count - 1;
+            pr_info!("changed the number of conventional zones to {zone_nr_conv}\n");
+        }
+
+        let zone_append_max_sectors =
+            align_down(zone_append_max_sectors, bytes_to_sectors(block_size_bytes))
+                .min(zone_capacity_sectors);
+
+        let seq_zone_count = zone_count - zone_nr_conv;
+
+        if zone_max_active >= seq_zone_count {
+            zone_max_active = 0;
+            pr_info!("zone_max_active limit disabled, limit >= zone count\n");
+        }
+
+        if zone_max_active != 0 && zone_max_open > zone_max_active {
+            zone_max_open = zone_max_active;
+            pr_info!("changed the maximum number of open zones to {zone_max_open}\n");
+        } else if zone_max_open >= seq_zone_count {
+            zone_max_open = 0;
+            pr_info!("zone_max_open limit disabled, limit >= zone count\n");
+        }
+
+        Ok(try_pin_init!(Self {
+            enabled: enable,
+            zones: init_zone_descriptors(
+                zone_size_sectors,
+                zone_capacity_sectors,
+                zone_count,
+                zone_nr_conv,
+            )?,
+            size_sectors: zone_size_sectors,
+            append_max_sectors: zone_append_max_sectors,
+            max_open: zone_max_open,
+            max_active: zone_max_active,
+            accounting <- new_spinlock!(ZoneAccounting {
+                implicit_open: 0,
+                explicit_open: 0,
+                closed: 0,
+                start_zone: zone_nr_conv,
+            }),
+            conventional_count: zone_nr_conv,
+        }))
+    }
+}
+
+struct ZoneAccounting {
+    implicit_open: u32,
+    explicit_open: u32,
+    closed: u32,
+    start_zone: u32,
+}
+
+pub(crate) fn init_zone_descriptors(
+    zone_size_sectors: u32,
+    zone_capacity_sectors: u32,
+    zone_count: u32,
+    zone_nr_conv: u32,
+) -> Result<Pin<KBox<[Mutex<ZoneDescriptor>]>>> {
+    let zone_capacity_sectors = if zone_capacity_sectors == 0 {
+        zone_size_sectors
+    } else {
+        zone_capacity_sectors
+    };
+
+    KBox::pin_slice(
+        |i| {
+            let sector = i as u64 * Into::<u64>::into(zone_size_sectors);
+            new_mutex!(
+                if i < zone_nr_conv.try_into().expect("Fewer than 2^32 zones") {
+                    ZoneDescriptor {
+                        start_sector: sector,
+                        size_sectors: zone_size_sectors,
+                        capacity_sectors: zone_size_sectors,
+                        kind: ZoneType::Conventional,
+                        write_pointer: sector + Into::<u64>::into(zone_size_sectors),
+                        condition: ZoneCondition::NoWritePointer,
+                    }
+                } else {
+                    ZoneDescriptor {
+                        start_sector: sector,
+                        size_sectors: zone_size_sectors,
+                        capacity_sectors: zone_capacity_sectors,
+                        kind: ZoneType::SequentialWriteRequired,
+                        write_pointer: sector,
+                        condition: ZoneCondition::Empty,
+                    }
+                }
+            )
+        },
+        zone_count as usize,
+        GFP_KERNEL,
+    )
+}
+
+impl super::NullBlkDevice {
+    pub(crate) fn handle_zoned_command(
+        &self,
+        hw_data: &Pin<&SpinLock<HwQueueContext>>,
+        rq: &mut Owned<mq::Request<Self>>,
+    ) -> Result {
+        use mq::Command::*;
+        match rq.command() {
+            ZoneAppend | Write => self.zoned_write(hw_data, rq)?,
+            ZoneReset | ZoneResetAll | ZoneOpen | ZoneClose | ZoneFinish => {
+                self.zone_management(hw_data, rq)?
+            }
+            _ => self.zoned_read(hw_data, rq)?,
+        }
+
+        Ok(())
+    }
+
+    fn zone_management(
+        &self,
+        hw_data: &Pin<&SpinLock<HwQueueContext>>,
+        rq: &mut Owned<mq::Request<Self>>,
+    ) -> Result {
+        if rq.command() == mq::Command::ZoneResetAll {
+            for zone in self.zoned.zones_iter() {
+                let mut zone = zone.lock();
+                use ZoneCondition::*;
+                match zone.condition {
+                    Empty | ReadOnly | Offline => continue,
+                    _ => self.zoned.reset_zone(&self.storage, hw_data, &mut zone)?,
+                }
+            }
+
+            return Ok(());
+        }
+
+        let zone = self.zoned.zone(rq.sector())?;
+        let mut zone = zone.lock();
+
+        if zone.condition == ZoneCondition::ReadOnly || zone.condition == ZoneCondition::Offline {
+            return Err(EIO);
+        }
+
+        use mq::Command::*;
+        match rq.command() {
+            ZoneOpen => self.zoned.open_zone(&mut zone, rq.sector()),
+            ZoneClose => self.zoned.close_zone(&mut zone),
+            ZoneReset => self.zoned.reset_zone(&self.storage, hw_data, &mut zone),
+            ZoneFinish => self.zoned.finish_zone(&mut zone, rq.sector()),
+            _ => Err(EIO),
+        }
+    }
+
+    fn zoned_read(
+        &self,
+        hw_data: &Pin<&SpinLock<HwQueueContext>>,
+        rq: &mut Owned<mq::Request<Self>>,
+    ) -> Result {
+        let zone = self.zoned.zone(rq.sector())?;
+        let zone = zone.lock();
+        if zone.condition == ZoneCondition::Offline {
+            return Err(EINVAL);
+        }
+
+        zone.check_bounds_read(rq.sector(), rq.sectors())?;
+
+        self.handle_regular_command(hw_data, rq)
+    }
+
+    fn zoned_write(
+        &self,
+        hw_data: &Pin<&SpinLock<HwQueueContext>>,
+        rq: &mut Owned<mq::Request<Self>>,
+    ) -> Result {
+        let zone = self.zoned.zone(rq.sector())?;
+        let mut zone = zone.lock();
+        let append: bool = rq.command() == mq::Command::ZoneAppend;
+
+        if zone.kind == ZoneType::Conventional {
+            if append {
+                return Err(EINVAL);
+            }
+
+            // NOTE: C driver does not check bounds on write.
+            zone.check_bounds_write(rq.sector(), rq.sectors())?;
+
+            let mut sectors = rq.sectors();
+            self.handle_bad_blocks(rq, &mut sectors)?;
+            return self.transfer(hw_data, rq, rq.command(), sectors);
+        }
+
+        // Check zoned write fits within zone
+        if zone.write_pointer + Into::<u64>::into(rq.sectors())
+            > zone.start_sector + Into::<u64>::into(zone.capacity_sectors)
+        {
+            return Err(EINVAL);
+        }
+
+        if append {
+            if self.zoned.append_max_sectors == 0 {
+                return Err(EINVAL);
+            }
+            rq.as_pin_mut().set_sector(zone.write_pointer);
+        }
+
+        // Check write pointer alignment
+        if !append && rq.sector() != zone.write_pointer {
+            return Err(EINVAL);
+        }
+
+        if zone.condition == ZoneCondition::Closed || zone.condition == ZoneCondition::Empty {
+            if self.zoned.use_accounting() {
+                let mut accounting = self.zoned.accounting.lock();
+                self.zoned
+                    .check_zone_resources(&mut accounting, &mut zone, rq.sector())?;
+
+                if zone.condition == ZoneCondition::Closed {
+                    accounting.closed -= 1;
+                    accounting.implicit_open += 1;
+                } else if zone.condition == ZoneCondition::Empty {
+                    accounting.implicit_open += 1;
+                }
+            }
+
+            zone.condition = ZoneCondition::ImplicitOpen;
+        }
+
+        let mut sectors = rq.sectors();
+        self.handle_bad_blocks(rq, &mut sectors)?;
+
+        if self.memory_backed {
+            memalloc_scope!(let _noio: NoIo);
+            self.transfer(hw_data, rq, mq::Command::Write, sectors)?;
+        }
+
+        zone.write_pointer += Into::<u64>::into(sectors);
+        if zone.write_pointer == zone.start_sector + Into::<u64>::into(zone.capacity_sectors) {
+            if self.zoned.use_accounting() {
+                let mut accounting = self.zoned.accounting.lock();
+
+                if zone.condition == ZoneCondition::ExplicitOpen {
+                    accounting.explicit_open -= 1;
+                } else if zone.condition == ZoneCondition::ImplicitOpen {
+                    accounting.implicit_open -= 1;
+                }
+            }
+
+            zone.condition = ZoneCondition::Full;
+        }
+
+        Ok(())
+    }
+
+    pub(crate) fn report_zones_internal(
+        disk: &GenDiskRef<Self>,
+        sector: u64,
+        nr_zones: u32,
+        callback: impl Fn(&bindings::blk_zone, u32) -> Result,
+    ) -> Result<u32> {
+        let device = disk.queue_data();
+        let first_zone = sector >> device.zoned.size_sectors.ilog2();
+
+        let mut count = 0;
+
+        for (i, zone) in device
+            .zoned
+            .zones
+            .split_at(first_zone as usize)
+            .1
+            .iter()
+            .take(nr_zones as usize)
+            .enumerate()
+        {
+            let zone = zone.lock();
+            let descriptor = bindings::blk_zone {
+                start: zone.start_sector,
+                len: zone.size_sectors.into(),
+                wp: zone.write_pointer,
+                capacity: zone.capacity_sectors.into(),
+                type_: zone.kind as u8,
+                cond: zone.condition as u8,
+                ..bindings::blk_zone::zeroed()
+            };
+            drop(zone);
+            callback(&descriptor, i as u32)?;
+
+            count += 1;
+        }
+
+        Ok(count)
+    }
+}
+
+impl ZoneOptions {
+    fn zone_no(&self, sector: u64) -> usize {
+        (sector >> self.size_sectors.ilog2()) as usize
+    }
+
+    fn zone(&self, sector: u64) -> Result<&Mutex<ZoneDescriptor>> {
+        self.zones.get(self.zone_no(sector)).ok_or(EINVAL)
+    }
+
+    fn zones_iter(&self) -> impl Iterator<Item = &Mutex<ZoneDescriptor>> {
+        self.zones.iter()
+    }
+
+    fn use_accounting(&self) -> bool {
+        self.max_active != 0 || self.max_open != 0
+    }
+
+    fn try_close_implicit_open_zone(&self, accounting: &mut ZoneAccounting, sector: u64) -> Result {
+        let skip = self.zone_no(sector) as u32;
+
+        let it = Iterator::chain(
+            self.zones[(accounting.start_zone as usize)..]
+                .iter()
+                .enumerate()
+                .map(|(i, z)| (i + accounting.start_zone as usize, z)),
+            self.zones[(self.conventional_count as usize)..(accounting.start_zone as usize)]
+                .iter()
+                .enumerate()
+                .map(|(i, z)| (i + self.conventional_count as usize, z)),
+        )
+        .filter(|(i, _)| *i != skip as usize);
+
+        for (index, zone) in it {
+            let mut zone = zone.lock();
+            if zone.condition == ZoneCondition::ImplicitOpen {
+                accounting.implicit_open -= 1;
+
+                let index_u32: u32 = index.try_into()?;
+                let next_zone: u32 = index_u32 + 1;
+                accounting.start_zone = if next_zone == self.zones.len().try_into()? {
+                    self.conventional_count
+                } else {
+                    next_zone
+                };
+
+                if zone.write_pointer == zone.start_sector {
+                    zone.condition = ZoneCondition::Empty;
+                } else {
+                    zone.condition = ZoneCondition::Closed;
+                    accounting.closed += 1;
+                }
+                return Ok(());
+            }
+        }
+
+        Err(EINVAL)
+    }
+
+    fn open_zone(&self, zone: &mut ZoneDescriptor, sector: u64) -> Result {
+        if zone.kind == ZoneType::Conventional {
+            return Err(EINVAL);
+        }
+
+        use ZoneCondition::*;
+        match zone.condition {
+            ExplicitOpen => return Ok(()),
+            Empty | ImplicitOpen | Closed => (),
+            _ => return Err(EIO),
+        }
+
+        if self.use_accounting() {
+            let mut accounting = self.accounting.lock();
+            match zone.condition {
+                Empty => {
+                    self.check_zone_resources(&mut accounting, zone, sector)?;
+                }
+                ImplicitOpen => {
+                    accounting.implicit_open -= 1;
+                }
+                Closed => {
+                    self.check_zone_resources(&mut accounting, zone, sector)?;
+                    accounting.closed -= 1;
+                }
+                _ => (),
+            }
+
+            accounting.explicit_open += 1;
+        }
+
+        zone.condition = ExplicitOpen;
+        Ok(())
+    }
+
+    fn check_zone_resources(
+        &self,
+        accounting: &mut ZoneAccounting,
+        zone: &mut ZoneDescriptor,
+        sector: u64,
+    ) -> Result {
+        match zone.condition {
+            ZoneCondition::Empty => {
+                self.check_active_zones(accounting)?;
+                self.check_open_zones(accounting, sector)
+            }
+            ZoneCondition::Closed => self.check_open_zones(accounting, sector),
+            _ => Err(EIO),
+        }
+    }
+
+    fn check_open_zones(&self, accounting: &mut ZoneAccounting, sector: u64) -> Result {
+        if self.max_open == 0 {
+            return Ok(());
+        }
+
+        if self.max_open > accounting.explicit_open + accounting.implicit_open {
+            return Ok(());
+        }
+
+        if accounting.implicit_open > 0 {
+            self.check_active_zones(accounting)?;
+            return self.try_close_implicit_open_zone(accounting, sector);
+        }
+
+        Err(EBUSY)
+    }
+
+    fn check_active_zones(&self, accounting: &mut ZoneAccounting) -> Result {
+        if self.max_active == 0 {
+            return Ok(());
+        }
+
+        if self.max_active > accounting.implicit_open + accounting.explicit_open + accounting.closed
+        {
+            return Ok(());
+        }
+
+        Err(EBUSY)
+    }
+
+    fn close_zone(&self, zone: &mut ZoneDescriptor) -> Result {
+        if zone.kind == ZoneType::Conventional {
+            return Err(EINVAL);
+        }
+
+        use ZoneCondition::*;
+        match zone.condition {
+            Closed => return Ok(()),
+            ImplicitOpen | ExplicitOpen => (),
+            _ => return Err(EIO),
+        }
+
+        if self.use_accounting() {
+            let mut accounting = self.accounting.lock();
+            match zone.condition {
+                ImplicitOpen => accounting.implicit_open -= 1,
+                ExplicitOpen => accounting.explicit_open -= 1,
+                _ => (),
+            }
+
+            if zone.write_pointer > zone.start_sector {
+                accounting.closed += 1;
+            }
+        }
+
+        if zone.write_pointer == zone.start_sector {
+            zone.condition = Empty;
+        } else {
+            zone.condition = Closed;
+        }
+
+        Ok(())
+    }
+
+    fn finish_zone(&self, zone: &mut ZoneDescriptor, sector: u64) -> Result {
+        if zone.kind == ZoneType::Conventional {
+            return Err(EINVAL);
+        }
+
+        if self.use_accounting() {
+            let mut accounting = self.accounting.lock();
+
+            use ZoneCondition::*;
+            match zone.condition {
+                Full => return Ok(()),
+                Empty => {
+                    self.check_zone_resources(&mut accounting, zone, sector)?;
+                }
+                ImplicitOpen => accounting.implicit_open -= 1,
+                ExplicitOpen => accounting.explicit_open -= 1,
+                Closed => {
+                    self.check_zone_resources(&mut accounting, zone, sector)?;
+                    accounting.closed -= 1;
+                }
+                _ => return Err(EIO),
+            }
+        }
+
+        zone.condition = ZoneCondition::Full;
+        zone.write_pointer = zone.start_sector + Into::<u64>::into(zone.size_sectors);
+
+        Ok(())
+    }
+
+    fn reset_zone(
+        &self,
+        storage: &crate::disk_storage::DiskStorage,
+        hw_data: &Pin<&SpinLock<HwQueueContext>>,
+        zone: &mut ZoneDescriptor,
+    ) -> Result {
+        if zone.kind == ZoneType::Conventional {
+            return Err(EINVAL);
+        }
+
+        if self.use_accounting() {
+            let mut accounting = self.accounting.lock();
+
+            use ZoneCondition::*;
+            match zone.condition {
+                ImplicitOpen => accounting.implicit_open -= 1,
+                ExplicitOpen => accounting.explicit_open -= 1,
+                Closed => accounting.closed -= 1,
+                Empty | Full => (),
+                _ => return Err(EIO),
+            }
+        }
+
+        zone.condition = ZoneCondition::Empty;
+        zone.write_pointer = zone.start_sector;
+
+        storage.discard(hw_data, zone.start_sector, zone.size_sectors);
+
+        Ok(())
+    }
+}
+
+pub(crate) struct ZoneDescriptor {
+    start_sector: u64,
+    size_sectors: u32,
+    kind: ZoneType,
+    capacity_sectors: u32,
+    write_pointer: u64,
+    condition: ZoneCondition,
+}
+
+impl ZoneDescriptor {
+    fn check_bounds_write(&self, sector: u64, sectors: u32) -> Result {
+        if sector + Into::<u64>::into(sectors)
+            > self.start_sector + Into::<u64>::into(self.capacity_sectors)
+        {
+            Err(EIO)
+        } else {
+            Ok(())
+        }
+    }
+
+    fn check_bounds_read(&self, sector: u64, sectors: u32) -> Result {
+        if sector + Into::<u64>::into(sectors) > self.write_pointer {
+            Err(EIO)
+        } else {
+            Ok(())
+        }
+    }
+}
+
+#[derive(Copy, Clone, PartialEq, Eq, Debug)]
+#[repr(u32)]
+enum ZoneType {
+    Conventional = bindings::blk_zone_type_BLK_ZONE_TYPE_CONVENTIONAL,
+    SequentialWriteRequired = bindings::blk_zone_type_BLK_ZONE_TYPE_SEQWRITE_REQ,
+    #[expect(dead_code)]
+    SequentialWritePreferred = bindings::blk_zone_type_BLK_ZONE_TYPE_SEQWRITE_PREF,
+}
+
+impl ZoneType {
+    #[expect(dead_code)]
+    fn as_raw(self) -> u32 {
+        self as u32
+    }
+}
+
+#[derive(Copy, Clone, PartialEq, Eq, Debug)]
+#[repr(u32)]
+enum ZoneCondition {
+    NoWritePointer = bindings::blk_zone_cond_BLK_ZONE_COND_NOT_WP,
+    Empty = bindings::blk_zone_cond_BLK_ZONE_COND_EMPTY,
+    ImplicitOpen = bindings::blk_zone_cond_BLK_ZONE_COND_IMP_OPEN,
+    ExplicitOpen = bindings::blk_zone_cond_BLK_ZONE_COND_EXP_OPEN,
+    Closed = bindings::blk_zone_cond_BLK_ZONE_COND_CLOSED,
+    Full = bindings::blk_zone_cond_BLK_ZONE_COND_FULL,
+    ReadOnly = bindings::blk_zone_cond_BLK_ZONE_COND_READONLY,
+    Offline = bindings::blk_zone_cond_BLK_ZONE_COND_OFFLINE,
+}
+
+impl ZoneCondition {
+    #[expect(dead_code)]
+    fn as_raw(self) -> u32 {
+        self as u32
+    }
+}

-- 
2.51.2



^ permalink raw reply related

* [PATCH v2 39/83] block: rust: add a request queue abstraction
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

Add the `RequestQueue` type as a Rust abstraction for `struct
request_queue`. This type provides methods to access the request queue
associated with a `GenDisk` or `Request`.

The abstraction exposes queue-related functionality needed by block
device drivers.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 rust/kernel/block/mq.rs               |  2 ++
 rust/kernel/block/mq/gen_disk.rs      |  7 ++++
 rust/kernel/block/mq/request_queue.rs | 60 +++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+)

diff --git a/rust/kernel/block/mq.rs b/rust/kernel/block/mq.rs
index 77e3593e8626..e89eb394001f 100644
--- a/rust/kernel/block/mq.rs
+++ b/rust/kernel/block/mq.rs
@@ -127,6 +127,7 @@
 pub mod gen_disk;
 mod operations;
 mod request;
+mod request_queue;
 pub mod tag_set;
 
 pub use operations::Operations;
@@ -135,4 +136,5 @@
     Request,
     RequestTimerHandle, //
 };
+pub use request_queue::RequestQueue;
 pub use tag_set::TagSet;
diff --git a/rust/kernel/block/mq/gen_disk.rs b/rust/kernel/block/mq/gen_disk.rs
index f51bccb0d2ef..6ba8d88f63a9 100644
--- a/rust/kernel/block/mq/gen_disk.rs
+++ b/rust/kernel/block/mq/gen_disk.rs
@@ -9,6 +9,7 @@
     bindings,
     block::mq::{
         Operations,
+        RequestQueue,
         TagSet, //
     },
     error::{
@@ -253,6 +254,12 @@ impl<T: Operations> GenDisk<T> {
     pub fn get_ref(&self) -> Arc<Revocable<GenDiskRef<T>>> {
         self.backref.clone()
     }
+
+    /// Get the [`RequestQueue`] associated with this [`GenDisk`].
+    pub fn queue(&self) -> &RequestQueue<T> {
+        // SAFETY: By type invariant, self is a valid gendisk.
+        unsafe { RequestQueue::from_raw((*self.gendisk).queue) }
+    }
 }
 
 // SAFETY: `GenDisk` is an owned pointer to a `struct gendisk` and an `Arc` to a
diff --git a/rust/kernel/block/mq/request_queue.rs b/rust/kernel/block/mq/request_queue.rs
new file mode 100644
index 000000000000..45fb55b1a310
--- /dev/null
+++ b/rust/kernel/block/mq/request_queue.rs
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use super::Operations;
+use crate::types::{
+    ForeignOwnable,
+    Opaque, //
+};
+use core::marker::PhantomData;
+
+/// A structure describing the queues associated with a block device.
+///
+/// Owned by a [`GenDisk`].
+///
+/// # Invariants
+///
+/// - `self.0` is a valid `bindings::request_queue`.
+/// - `self.0.queuedata` is a valid `T::QueueData`.
+#[repr(transparent)]
+pub struct RequestQueue<T>(Opaque<bindings::request_queue>, PhantomData<T>);
+
+impl<T> RequestQueue<T>
+where
+    T: Operations,
+{
+    /// Create a [`RequestQueue`] from a raw `bindings::request_queue` pointer
+    ///
+    /// # Safety
+    ///
+    /// - `ptr` must be valid for use as a reference for the duration of `'a`.
+    /// - `ptr` must have been initialized as part of [`GenDiskBuilder::build`].
+    pub(crate) unsafe fn from_raw<'a>(ptr: *const bindings::request_queue) -> &'a Self {
+        // INVARIANT:
+        // - By function safety requirements, `ptr` is a valid `request_queue`.
+        // - By function safety requirement `ptr` was initialized by [`GenDiskBuilder::build`], and
+        //   thus `queuedata` was set to point to a valid `T::QueueData`.
+        //
+        // SAFETY: By function safety requirements `ptr` is valid for use as a reference.
+        unsafe { &*ptr.cast() }
+    }
+
+    /// Get the driver private data associated with this [`RequestQueue`].
+    pub fn queue_data(&self) -> <T::QueueData as ForeignOwnable>::Borrowed<'_> {
+        // SAFETY: By type invariant, `queuedata` is a valid `T::QueueData`.
+        unsafe { T::QueueData::borrow((*self.0.get()).queuedata) }
+    }
+
+    /// Stop all hardware queues of this [`RequestQueue`].
+    pub fn stop_hw_queues(&self) {
+        // SAFETY: By type invariant, `self.0` is a valid `request_queue`.
+        unsafe { bindings::blk_mq_stop_hw_queues(self.0.get()) }
+    }
+
+    /// Start all hardware queues of this [`RequestQueue`].
+    ///
+    /// This function will mark the queues as ready and if necessary, schedule the queues to run.
+    pub fn start_stopped_hw_queues_async(&self) {
+        // SAFETY: By type invariant, `self.0` is a valid `request_queue`.
+        unsafe { bindings::blk_mq_start_stopped_hw_queues(self.0.get(), true) }
+    }
+}

-- 
2.51.2



^ permalink raw reply related

* [PATCH v2 26/83] block: rust: change sector type from usize to u64
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

Change the `sector()` and `sectors()` methods in `Request` to return
`u64` and `u32` respectively instead of `usize`. This matches the
underlying kernel types.

Update rnull driver to handle the new sector types with appropriate
casting throughout the read, write, and discard operations.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 drivers/block/rnull/rnull.rs    | 71 +++++++++++++++++++++++++----------------
 rust/kernel/block/mq/request.rs |  8 ++---
 2 files changed, 47 insertions(+), 32 deletions(-)

diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index cb5b642f68e5..73f14d6e379f 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -218,6 +218,13 @@ fn new(options: NullBlkOptions<'_>) -> Result<GenDisk<Self>> {
             kernel::alloc::NumaNode::new(home_node)?
         };
 
+        let capacity_sectors = capacity_mib << (20 - block::SECTOR_SHIFT);
+
+        // Prevent overflow in usize/u64 casts
+        if usize::BITS == 32 && capacity_sectors > u32::MAX.into() {
+            return Err(code::EINVAL);
+        }
+
         let tagset = Arc::pin_init(
             TagSet::new(submit_queues, 256, 1, numa_node, flags),
             GFP_KERNEL,
@@ -229,13 +236,13 @@ fn new(options: NullBlkOptions<'_>) -> Result<GenDisk<Self>> {
                 irq_mode,
                 completion_time,
                 memory_backed,
-                block_size: block_size as usize,
+                block_size: block_size.into(),
             }),
             GFP_KERNEL,
         )?;
 
         let mut builder = gen_disk::GenDiskBuilder::new()
-            .capacity_sectors(capacity_mib << (20 - block::SECTOR_SHIFT))
+            .capacity_sectors(capacity_sectors)
             .logical_block_size(block_size)?
             .physical_block_size(block_size)?
             .rotational(rotational);
@@ -250,12 +257,13 @@ fn new(options: NullBlkOptions<'_>) -> Result<GenDisk<Self>> {
     }
 
     #[inline(always)]
-    fn write(tree: &XArray<TreeNode>, mut sector: usize, mut segment: Segment<'_>) -> Result {
+    fn write(tree: &XArray<TreeNode>, mut sector: u64, mut segment: Segment<'_>) -> Result {
         while !segment.is_empty() {
             let page = NullBlockPage::new()?;
             let mut tree = tree.lock();
 
-            let page_idx = sector >> block::PAGE_SECTORS_SHIFT;
+            // CAST: Device size limited during setup to (2^32)-1 on 32 bit systems.
+            let page_idx = (sector >> block::PAGE_SECTORS_SHIFT) as usize;
 
             let page = if let Some(page) = tree.get_mut(page_idx) {
                 page
@@ -265,43 +273,50 @@ fn write(tree: &XArray<TreeNode>, mut sector: usize, mut segment: Segment<'_>) -
             };
 
             page.set_occupied(sector);
-            let page_offset = (sector & block::PAGE_SECTOR_MASK as usize) << block::SECTOR_SHIFT;
-            sector +=
-                segment.copy_to_page(page.page.as_pin_mut(), page_offset) >> block::SECTOR_SHIFT;
+
+            // CAST: Page offset always fits in 32 bits.
+            let page_offset =
+                ((sector & u64::from(block::PAGE_SECTOR_MASK)) << block::SECTOR_SHIFT) as usize;
+
+            // CAST: Casting from `usize` to `u64` never overflows.
+            sector += segment.copy_to_page(page.page.as_pin_mut(), page_offset) as u64
+                >> block::SECTOR_SHIFT;
         }
         Ok(())
     }
 
     #[inline(always)]
-    fn read(tree: &XArray<TreeNode>, mut sector: usize, mut segment: Segment<'_>) -> Result {
+    fn read(tree: &XArray<TreeNode>, mut sector: u64, mut segment: Segment<'_>) -> Result {
         let tree = tree.lock();
 
         while !segment.is_empty() {
-            let idx = sector >> block::PAGE_SECTORS_SHIFT;
+            // CAST: Device size limited during setup to (2^32)-1 on 32 bit systems.
+            let page_idx = (sector >> block::PAGE_SECTORS_SHIFT) as usize;
 
-            if let Some(page) = tree.get(idx) {
+            if let Some(page) = tree.get(page_idx) {
+                // CAST: Page offset always fits in 32 bits.
                 let page_offset =
-                    (sector & block::PAGE_SECTOR_MASK as usize) << block::SECTOR_SHIFT;
-                sector += segment.copy_from_page(&page.page, page_offset) >> block::SECTOR_SHIFT;
+                    ((sector & u64::from(block::PAGE_SECTOR_MASK)) << block::SECTOR_SHIFT) as usize;
+
+                // CAST: Casting from `usize` to `u64` never overflows.
+                sector +=
+                    segment.copy_from_page(&page.page, page_offset) as u64 >> block::SECTOR_SHIFT;
             } else {
-                sector += segment.zero_page() >> block::SECTOR_SHIFT;
+                // CAST: Casting from `usize` to `u64` never overflows.
+                sector += segment.zero_page() as u64 >> block::SECTOR_SHIFT;
             }
         }
 
         Ok(())
     }
 
-    fn discard(
-        tree: &XArray<TreeNode>,
-        mut sector: usize,
-        sectors: usize,
-        block_size: usize,
-    ) -> Result {
+    fn discard(tree: &XArray<TreeNode>, mut sector: u64, sectors: u64, block_size: u64) -> Result {
         let mut remaining_bytes = sectors << SECTOR_SHIFT;
         let mut tree = tree.lock();
 
         while remaining_bytes > 0 {
-            let page_idx = sector >> block::PAGE_SECTORS_SHIFT;
+            // CAST: Device size limited during setup to (2^32)-1 on 32 bit systems.
+            let page_idx = (sector >> block::PAGE_SECTORS_SHIFT) as usize;
             let mut remove = false;
             if let Some(page) = tree.get_mut(page_idx) {
                 page.set_free(sector);
@@ -326,7 +341,7 @@ fn discard(
     fn transfer(
         command: bindings::req_op,
         tree: &XArray<TreeNode>,
-        sector: usize,
+        sector: u64,
         segment: Segment<'_>,
     ) -> Result {
         match command {
@@ -356,13 +371,13 @@ fn new() -> Result<KBox<Self>> {
         )?)
     }
 
-    fn set_occupied(&mut self, sector: usize) {
-        let idx = sector & PAGE_SECTOR_MASK as usize;
+    fn set_occupied(&mut self, sector: u64) {
+        let idx = sector & u64::from(PAGE_SECTOR_MASK);
         self.status |= 1 << idx;
     }
 
-    fn set_free(&mut self, sector: usize) {
-        let idx = sector & PAGE_SECTOR_MASK as usize;
+    fn set_free(&mut self, sector: u64) {
+        let idx = sector & u64::from(PAGE_SECTOR_MASK);
         self.status &= !(1 << idx);
     }
 
@@ -380,7 +395,7 @@ struct QueueData {
     irq_mode: IRQMode,
     completion_time: Delta,
     memory_backed: bool,
-    block_size: usize,
+    block_size: u64,
 }
 
 #[pin_data]
@@ -432,14 +447,14 @@ fn queue_rq(
             let mut sector = rq.sector();
 
             if command == bindings::req_op_REQ_OP_DISCARD {
-                Self::discard(tree, sector, rq.sectors(), queue_data.block_size)?;
+                Self::discard(tree, sector, rq.sectors().into(), queue_data.block_size)?;
             } else {
                 for bio in rq.bio_iter_mut() {
                     let segment_iter = bio.segment_iter();
                     for segment in segment_iter {
                         let length = segment.len();
                         Self::transfer(command, tree, sector, segment)?;
-                        sector += length as usize >> block::SECTOR_SHIFT;
+                        sector += u64::from(length) >> block::SECTOR_SHIFT;
                     }
                 }
             }
diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request.rs
index 54fe580b7b42..9e176f015ab8 100644
--- a/rust/kernel/block/mq/request.rs
+++ b/rust/kernel/block/mq/request.rs
@@ -178,16 +178,16 @@ pub fn bio_iter_mut<'a>(self: &'a mut Owned<Self>) -> BioIterator<'a> {
 
     /// Get the target sector for the request.
     #[inline(always)]
-    pub fn sector(&self) -> usize {
+    pub fn sector(&self) -> u64 {
         // SAFETY: By type invariant of `Self`, `self.0` is valid and live.
-        unsafe { (*self.0.get()).__sector as usize }
+        unsafe { (*self.0.get()).__sector }
     }
 
     /// Get the size of the request in number of sectors.
     #[inline(always)]
-    pub fn sectors(&self) -> usize {
+    pub fn sectors(&self) -> u32 {
         // SAFETY: By type invariant of `Self`, `self.0` is valid and live.
-        (unsafe { (*self.0.get()).__data_len as usize }) >> crate::block::SECTOR_SHIFT
+        (unsafe { (*self.0.get()).__data_len }) >> crate::block::SECTOR_SHIFT
     }
 
     /// Return a pointer to the [`RequestDataWrapper`] stored in the private area

-- 
2.51.2



^ permalink raw reply related

* [PATCH v2 41/83] block: rust: introduce `kernel::block::error`
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

Block layer status codes, represented by `blk_status_t`, are only one
byte. This is different from the general kernel error codes.

Add `BlkError` and `BlkResult` to handle these status codes.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 rust/kernel/block.rs | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 rust/kernel/error.rs |  3 +-
 2 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/rust/kernel/block.rs b/rust/kernel/block.rs
index 96e48a2e6116..b3578f28871a 100644
--- a/rust/kernel/block.rs
+++ b/rust/kernel/block.rs
@@ -18,3 +18,97 @@
 /// The difference between the size of a page and the size of a sector,
 /// expressed as a power of two.
 pub const PAGE_SECTORS_SHIFT: u32 = bindings::PAGE_SECTORS_SHIFT;
+
+pub mod error {
+    //! Block layer errors.
+
+    use core::num::NonZeroU8;
+
+    pub mod code {
+        //! C compatible error codes for the block subsystem.
+        macro_rules! declare_err {
+            ($err:tt $(,)? $($doc:expr),+) => {
+                $(
+                    #[doc = $doc]
+                )*
+                    pub const $err: super::BlkError =
+                    match super::BlkError::try_from_blk_status(crate::bindings::$err as u8) {
+                        Some(err) => err,
+                        None => panic!("Invalid errno in `declare_err!`"),
+                    };
+            };
+        }
+
+        declare_err!(BLK_STS_NOTSUPP, "Operation not supported.");
+        declare_err!(BLK_STS_IOERR, "Generic IO error.");
+        declare_err!(BLK_STS_DEV_RESOURCE, "Device resource busy. Retry later.");
+    }
+
+    /// A wrapper around a 1 byte block layer error code.
+    #[derive(Clone, Copy, PartialEq, Eq)]
+    pub struct BlkError(NonZeroU8);
+
+    impl BlkError {
+        /// Create a [`BlkError`] from a `blk_status_t`.
+        ///
+        /// If the code is not know, this function will warn and return [`code::BLK_STS_IOERR`].
+        pub fn from_blk_status(status: bindings::blk_status_t) -> Self {
+            if let Some(error) = Self::try_from_blk_status(status) {
+                error
+            } else {
+                kernel::pr_warn!("Attempted to create `BlkError` from invalid value");
+                code::BLK_STS_IOERR
+            }
+        }
+
+        /// Convert `Self` to the underlying type.
+        pub fn to_blk_status(self) -> bindings::blk_status_t {
+            self.0.into()
+        }
+
+        /// Try to create a `Self` form a `blk_status_t`.
+        ///
+        /// Returns `None` if the conversion fails.
+        const fn try_from_blk_status(errno: bindings::blk_status_t) -> Option<Self> {
+            if errno == 0 {
+                None
+            } else {
+                Some(BlkError(
+                    // SAFETY: We just checked that `errno`is nonzero.
+                    unsafe { NonZeroU8::new_unchecked(errno) },
+                ))
+            }
+        }
+    }
+
+    impl From<BlkError> for u8 {
+        fn from(value: BlkError) -> Self {
+            value.0.into()
+        }
+    }
+
+    impl From<BlkError> for u32 {
+        fn from(value: BlkError) -> Self {
+            let value: u8 = value.0.into();
+            value.into()
+        }
+    }
+
+    impl From<kernel::error::Error> for BlkError {
+        fn from(_value: kernel::error::Error) -> Self {
+            code::BLK_STS_IOERR
+        }
+    }
+
+    /// A result with a [`BlkError`] error type.
+    pub type BlkResult<T = ()> = Result<T, BlkError>;
+
+    /// Convert a `blk_status_t` to a `BlkResult`.
+    pub fn to_result(status: bindings::blk_status_t) -> BlkResult {
+        if status == bindings::BLK_STS_OK {
+            Ok(())
+        } else {
+            Err(BlkError::from_blk_status(status))
+        }
+    }
+}
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 05cf869ac090..6dd14a72526f 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -163,8 +163,9 @@ pub fn to_errno(self) -> crate::ffi::c_int {
         self.0.get()
     }
 
+    /// Convert a generic kernel error to a block layer error.
     #[cfg(CONFIG_BLOCK)]
-    pub(crate) fn to_blk_status(self) -> bindings::blk_status_t {
+    pub fn to_blk_status(self) -> bindings::blk_status_t {
         // SAFETY: `self.0` is a valid error due to its invariant.
         unsafe { bindings::errno_to_blk_status(self.0.get()) }
     }

-- 
2.51.2



^ permalink raw reply related

* [PATCH v2 78/83] block: rust: add max_sectors option to `GenDiskBuilder`
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

Allow drivers to set the maximum I/O size when building a `GenDisk`.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 rust/kernel/block/mq/gen_disk.rs | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/rust/kernel/block/mq/gen_disk.rs b/rust/kernel/block/mq/gen_disk.rs
index a50ba7b605d7..6d760dafade5 100644
--- a/rust/kernel/block/mq/gen_disk.rs
+++ b/rust/kernel/block/mq/gen_disk.rs
@@ -58,6 +58,7 @@ pub struct GenDiskBuilder<T> {
     zone_append_max_sectors: u32,
     write_cache: bool,
     forced_unit_access: bool,
+    max_sectors: u32,
     _p: PhantomData<T>,
 }
 
@@ -77,6 +78,7 @@ fn default() -> Self {
             zone_append_max_sectors: 0,
             write_cache: false,
             forced_unit_access: false,
+            max_sectors: 0,
             _p: PhantomData,
         }
     }
@@ -181,6 +183,12 @@ pub fn write_cache(mut self, enable: bool) -> Self {
         self
     }
 
+    /// Maximum size of a command in 512 byte sectors.
+    pub fn max_sectors(mut self, sectors: u32) -> Self {
+        self.max_sectors = sectors;
+        self
+    }
+
     /// Build a new `GenDisk` and add it to the VFS.
     pub fn build(
         self,
@@ -199,6 +207,7 @@ pub fn build(
         lim.logical_block_size = self.logical_block_size;
         lim.physical_block_size = self.physical_block_size;
         lim.max_hw_discard_sectors = self.max_hw_discard_sectors;
+        lim.max_sectors = self.max_sectors;
         if self.rotational {
             lim.features = Feature::Rotational.into();
         }

-- 
2.51.2



^ permalink raw reply related

* [PATCH v2 25/83] block: rnull: add no_sched module parameter and configfs attribute
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

Add support for disabling the default IO scheduler by adding:
- no_sched module parameter to control scheduler selection at device
  creation.
- no_sched configfs attribute (ID 11) for runtime configuration.
- Use of NO_DEFAULT_SCHEDULER flag when no_sched is enabled.

This allows bypassing the default 'mq-deadline' scheduler and using 'none'
instead, which can improve performance for certain workloads. The flag
selection logic is updated to use compound assignment operators for better
readability.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 drivers/block/rnull/configfs.rs |  6 ++++++
 drivers/block/rnull/rnull.rs    | 25 ++++++++++++++++++-------
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index e47399cd45a4..d9aead646ae0 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -94,6 +94,7 @@ fn make_group(
                 use_per_node_hctx: 8,
                 home_node: 9,
                 discard: 10,
+                no_sched:11,
             ],
         };
 
@@ -115,6 +116,7 @@ fn make_group(
                     submit_queues: 1,
                     home_node: bindings::NUMA_NO_NODE,
                     discard: false,
+                    no_sched: false,
                 }),
             }),
             core::iter::empty(),
@@ -183,6 +185,7 @@ struct DeviceConfigInner {
     submit_queues: u32,
     home_node: i32,
     discard: bool,
+    no_sched: bool,
 }
 
 #[vtable]
@@ -217,6 +220,7 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
                 submit_queues: guard.submit_queues,
                 home_node: guard.home_node,
                 discard: guard.discard,
+                no_sched: guard.no_sched,
             })?);
             guard.powered = true;
         } else if guard.powered && !power_op {
@@ -322,3 +326,5 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
         Ok(())
     })
 );
+
+configfs_simple_bool_field!(DeviceConfig, 11, no_sched);
diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index bdc05b3f6072..cb5b642f68e5 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -30,8 +30,8 @@
     new_mutex,
     new_xarray,
     page::{
-        SafePage, //
-        PAGE_SIZE,
+        SafePage,
+        PAGE_SIZE, //
     },
     pr_info,
     prelude::*,
@@ -110,6 +110,10 @@
             description:
                 "Support discard operations (requires memory-backed null_blk device).",
         },
+        no_sched: bool {
+            default: false,
+            description: "No IO scheduler",
+        },
     },
 }
 
@@ -148,6 +152,7 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
                     submit_queues,
                     home_node: module_parameters::home_node.value(),
                     discard: module_parameters::discard.value(),
+                    no_sched: module_parameters::no_sched.value(),
                 })?;
                 disks.push(disk, GFP_KERNEL)?;
             }
@@ -173,6 +178,7 @@ struct NullBlkOptions<'a> {
     submit_queues: u32,
     home_node: i32,
     discard: bool,
+    no_sched: bool,
 }
 struct NullBlkDevice;
 
@@ -189,13 +195,18 @@ fn new(options: NullBlkOptions<'_>) -> Result<GenDisk<Self>> {
             submit_queues,
             home_node,
             discard,
+            no_sched,
         } = options;
 
-        let flags = if memory_backed {
-            mq::tag_set::Flag::Blocking.into()
-        } else {
-            mq::tag_set::Flags::default()
-        };
+        let mut flags = mq::tag_set::Flags::default();
+
+        if memory_backed {
+            flags |= mq::tag_set::Flag::Blocking;
+        }
+
+        if no_sched {
+            flags |= mq::tag_set::Flag::NoDefaultScheduler;
+        }
 
         if home_node > kernel::numa::num_online_nodes().try_into()? {
             return Err(code::EINVAL);

-- 
2.51.2



^ permalink raw reply related

* [PATCH v2 74/83] block: rust: add `Request::queue_index`
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

Add a method to query a request about the index for the hardware queue
associated with the request.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 rust/kernel/block/mq/request.rs | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request.rs
index 05b167dfc6c6..54b5202567f8 100644
--- a/rust/kernel/block/mq/request.rs
+++ b/rust/kernel/block/mq/request.rs
@@ -191,6 +191,13 @@ pub fn hw_data(&self) -> <T::HwData as ForeignOwnable>::Borrowed<'_> {
         unsafe { T::HwData::borrow((*hctx).driver_data) }
     }
 
+    /// Get the queue index for the hardware queue associated with this request.
+    pub fn queue_index(&self) -> u32 {
+        // SAFETY: The requests is guaranteed to be associated with a hardware
+        // context while we have access to it.
+        unsafe { (*self.hctx_raw()).queue_num }
+    }
+
     pub fn is_poll(&self) -> bool {
         let hctx = self.hctx_raw();
 

-- 
2.51.2



^ permalink raw reply related

* [PATCH v2 81/83] block: rnull: add `virt_boundary` option
From: Andreas Hindborg @ 2026-06-09 19:09 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

Add a configfs attribute to configure the virtual memory boundary mask
for the rnull block device. This allows testing how drivers and
filesystems handle devices with specific alignment requirements.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 drivers/block/rnull/configfs.rs |  5 +++++
 drivers/block/rnull/rnull.rs    | 17 ++++++++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index 5ab217e43e2b..3e054339226c 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -133,6 +133,7 @@ fn make_group(
                 poll_queues: 27,
                 fua: 28,
                 max_sectors: 29,
+                virt_boundary: 30,
             ],
         };
 
@@ -221,6 +222,7 @@ fn make_group(
                     #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
                     init_hctx_inject,
                     max_sectors: 0,
+                    virt_boundary: false,
                 }),
             }),
             default_groups,
@@ -315,6 +317,7 @@ struct DeviceConfigInner {
     #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
     init_hctx_inject: Arc<FaultConfig>,
     max_sectors: u32,
+    virt_boundary: bool,
 }
 
 #[vtable]
@@ -388,6 +391,7 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
                 #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
                 timeout_inject: guard.timeout_inject.clone(),
                 max_sectors: guard.max_sectors,
+                virt_boundary: guard.virt_boundary,
             })?);
             guard.powered = true;
         } else if guard.powered && !power_op {
@@ -617,3 +621,4 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
 }
 configfs_simple_bool_field!(DeviceConfig, 28, fua);
 configfs_simple_field!(DeviceConfig, 29, max_sectors, u32);
+configfs_simple_bool_field!(DeviceConfig, 30, virt_boundary);
diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index 15b8c365b9fa..147dc8498c3a 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -28,7 +28,10 @@
             BadBlocks, //
         },
         bio::Segment,
-        error::{BlkError, BlkResult},
+        error::{
+            BlkError,
+            BlkResult, //
+        },
         mq::{
             self,
             gen_disk::{
@@ -54,6 +57,7 @@
     memalloc_scope,
     new_mutex,
     new_spinlock,
+    page::PAGE_SIZE,
     pr_info,
     prelude::*,
     revocable::Revocable,
@@ -208,6 +212,10 @@
             default: 0,
             description: "Maximum size of a command (in 512B sectors)",
         },
+        virt_boundary: bool {
+            default: false,
+            description: "Set alignment requirement for IO buffers to be page size.",
+        },
     },
 }
 
@@ -312,6 +320,7 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
                     #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
                     timeout_inject: Arc::pin_init(FaultConfig::new(c"timeout_inject"), GFP_KERNEL)?,
                     max_sectors: module_parameters::max_sectors.value(),
+                    virt_boundary: module_parameters::virt_boundary.value(),
                 })?;
                 disks.push(disk, GFP_KERNEL)?;
             }
@@ -358,6 +367,7 @@ struct NullBlkOptions<'a> {
     #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
     timeout_inject: Arc<FaultConfig>,
     max_sectors: u32,
+    virt_boundary: bool,
 }
 
 #[pin_data]
@@ -494,6 +504,7 @@ fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
             #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
             timeout_inject,
             max_sectors,
+            virt_boundary,
         } = options;
 
         let memory_backed = tag_set.memory_backed;
@@ -558,6 +569,10 @@ fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
             .forced_unit_access(forced_unit_access && storage.cache_enabled())
             .max_sectors(max_sectors);
 
+        if virt_boundary {
+            builder = builder.virt_boundary_mask(PAGE_SIZE - 1);
+        }
+
         #[cfg(CONFIG_BLK_DEV_ZONED)]
         {
             builder = builder

-- 
2.51.2



^ permalink raw reply related

* [PATCH v2 50/83] block: rust: move gendisk vtable construction to separate function
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

Refactor the `GenDiskBuilder::build` method to move the `gendisk`
vtable construction into a separate helper function. This prepares for
adding zoned block device support which requires conditional vtable
setup.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 drivers/block/rnull/configfs.rs  |  5 ++-
 rust/kernel/block/mq/gen_disk.rs | 67 +++++++++++++++++++++++-----------------
 2 files changed, 43 insertions(+), 29 deletions(-)

diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index 2dfc87dff66a..8fa16dbc2a75 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -290,7 +290,10 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
     }
 }
 
-configfs_simple_field!(DeviceConfig, 1, block_size, u32, check GenDiskBuilder::validate_block_size);
+configfs_simple_field!(DeviceConfig, 1,
+                       block_size, u32,
+                       check GenDiskBuilder::<NullBlkDevice>::validate_block_size
+);
 configfs_simple_bool_field!(DeviceConfig, 2, rotational);
 configfs_simple_field!(DeviceConfig, 3, capacity_mib, u64);
 configfs_simple_field!(DeviceConfig, 4, irq_mode, IRQMode);
diff --git a/rust/kernel/block/mq/gen_disk.rs b/rust/kernel/block/mq/gen_disk.rs
index 49ce5ac4774d..79a67b545eca 100644
--- a/rust/kernel/block/mq/gen_disk.rs
+++ b/rust/kernel/block/mq/gen_disk.rs
@@ -34,20 +34,24 @@
         ScopeGuard, //
     },
 };
-use core::ptr::NonNull;
+use core::{
+    marker::PhantomData,
+    ptr::NonNull, //
+};
 
 /// A builder for [`GenDisk`].
 ///
 /// Use this struct to configure and add new [`GenDisk`] to the VFS.
-pub struct GenDiskBuilder {
+pub struct GenDiskBuilder<T> {
     rotational: bool,
     logical_block_size: u32,
     physical_block_size: u32,
     capacity_sectors: u64,
     max_hw_discard_sectors: u32,
+    _p: PhantomData<T>,
 }
 
-impl Default for GenDiskBuilder {
+impl<T> Default for GenDiskBuilder<T> {
     fn default() -> Self {
         Self {
             rotational: false,
@@ -55,11 +59,12 @@ fn default() -> Self {
             physical_block_size: bindings::PAGE_SIZE as u32,
             capacity_sectors: 0,
             max_hw_discard_sectors: 0,
+            _p: PhantomData,
         }
     }
 }
 
-impl GenDiskBuilder {
+impl<T: Operations> GenDiskBuilder<T> {
     /// Create a new instance.
     pub fn new() -> Self {
         Self::default()
@@ -126,7 +131,7 @@ pub fn max_hw_discard_sectors(mut self, max_hw_discard_sectors: u32) -> Self {
     }
 
     /// Build a new `GenDisk` and add it to the VFS.
-    pub fn build<T: Operations>(
+    pub fn build(
         self,
         name: fmt::Arguments<'_>,
         tagset: Arc<TagSet<T>>,
@@ -157,30 +162,8 @@ pub fn build<T: Operations>(
             )
         })?;
 
-        const TABLE: bindings::block_device_operations = bindings::block_device_operations {
-            submit_bio: None,
-            open: None,
-            release: None,
-            ioctl: None,
-            compat_ioctl: None,
-            check_events: None,
-            unlock_native_capacity: None,
-            getgeo: None,
-            set_read_only: None,
-            swap_slot_free_notify: None,
-            report_zones: None,
-            devnode: None,
-            alternative_gpt_sector: None,
-            get_unique_id: None,
-            // TODO: Set to `THIS_MODULE`.
-            owner: core::ptr::null_mut(),
-            pr_ops: core::ptr::null_mut(),
-            free_disk: None,
-            poll_bio: None,
-        };
-
         // SAFETY: `gendisk` is a valid pointer as we initialized it above
-        unsafe { (*gendisk).fops = &TABLE };
+        unsafe { (*gendisk).fops = Self::build_vtable() };
 
         let mut writer = NullTerminatedFormatter::new(
             // SAFETY: `gendisk` points to a valid and initialized instance. We
@@ -233,6 +216,34 @@ pub fn build<T: Operations>(
 
         Ok(disk.into())
     }
+
+    const VTABLE: bindings::block_device_operations = bindings::block_device_operations {
+        submit_bio: None,
+        open: None,
+        release: None,
+        ioctl: None,
+        compat_ioctl: None,
+        check_events: None,
+        unlock_native_capacity: None,
+        getgeo: None,
+        set_read_only: None,
+        swap_slot_free_notify: None,
+        report_zones: None,
+        devnode: None,
+        alternative_gpt_sector: None,
+        get_unique_id: None,
+        // TODO: Set to THIS_MODULE. Waiting for const_refs_to_static feature to
+        // be merged (unstable in rustc 1.78 which is staged for linux 6.10)
+        // <https://github.com/rust-lang/rust/issues/119618>
+        owner: core::ptr::null_mut(),
+        pr_ops: core::ptr::null_mut(),
+        free_disk: None,
+        poll_bio: None,
+    };
+
+    pub(crate) const fn build_vtable() -> &'static bindings::block_device_operations {
+        &Self::VTABLE
+    }
 }
 
 /// A generic block device.

-- 
2.51.2



^ permalink raw reply related

* [PATCH v2 36/83] block: rust: implement `Sync` for `GenDisk`.
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

`GenDisk` is a pointer to a `struct gendisk`. It is safe to reference this
struct from multiple threads.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 rust/kernel/block/mq/gen_disk.rs | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/rust/kernel/block/mq/gen_disk.rs b/rust/kernel/block/mq/gen_disk.rs
index 2b204b0ed49a..94af85fe1716 100644
--- a/rust/kernel/block/mq/gen_disk.rs
+++ b/rust/kernel/block/mq/gen_disk.rs
@@ -234,6 +234,17 @@ unsafe impl<T> Send for GenDisk<T>
 {
 }
 
+// SAFETY: `GenDisk` is an owned pointer to a `struct gendisk` and an `Arc` to a `TagSet`. It is
+// safe to reference these from multiple threads if the `Arc` and the `gendisk` private data is
+// `Sync`.
+unsafe impl<T> Sync for GenDisk<T>
+where
+    T: Operations,
+    T::QueueData: Sync,
+    Arc<TagSet<T>>: Sync,
+{
+}
+
 impl<T: Operations> Drop for GenDisk<T> {
     fn drop(&mut self) {
         // SAFETY: By type invariant of `Self`, `self.gendisk` points to a valid

-- 
2.51.2



^ permalink raw reply related

* [PATCH v2 71/83] block: rust: remove the `is_poll` parameter from `queue_rq`
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

The information can now be obtained from `Request::is_poll`.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 drivers/block/rnull/rnull.rs       | 1 -
 rust/kernel/block/mq.rs            | 1 -
 rust/kernel/block/mq/operations.rs | 7 -------
 3 files changed, 9 deletions(-)

diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index 32af69bbf8f0..8e17b2b17a66 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -960,7 +960,6 @@ fn queue_rq(
         this: ArcBorrow<'_, Self>,
         rq: Owned<mq::IdleRequest<Self>>,
         is_last: bool,
-        _is_poll: bool,
     ) -> BlkResult {
         Ok(Self::queue_rq_internal(hw_data, this, rq, is_last)?)
     }
diff --git a/rust/kernel/block/mq.rs b/rust/kernel/block/mq.rs
index e8f0d03f2ff7..47e1f860c6ba 100644
--- a/rust/kernel/block/mq.rs
+++ b/rust/kernel/block/mq.rs
@@ -90,7 +90,6 @@
 //!         _queue_data: (),
 //!         rq: Owned<IdleRequest<Self>>,
 //!         _is_last: bool,
-//!         is_poll: bool
 //!     ) -> BlkResult {
 //!         rq.start().end_ok();
 //!         Ok(())
diff --git a/rust/kernel/block/mq/operations.rs b/rust/kernel/block/mq/operations.rs
index 505e7d2b2253..d28af9a5e006 100644
--- a/rust/kernel/block/mq/operations.rs
+++ b/rust/kernel/block/mq/operations.rs
@@ -93,7 +93,6 @@ fn queue_rq(
         queue_data: ForeignBorrowed<'_, Self::QueueData>,
         rq: Owned<IdleRequest<Self>>,
         is_last: bool,
-        is_poll: bool,
     ) -> BlkResult;
 
     /// Called by the kernel to queue a list of requests with the driver.
@@ -214,11 +213,6 @@ impl<T: Operations> OperationsVTable<T> {
         // `into_foreign` in `Self::init_hctx_callback`.
         let hw_data = unsafe { T::HwData::borrow((*hctx).driver_data) };
 
-        let is_poll = u32::from(
-            // SAFETY: `hctx` is valid as required by this function.
-            unsafe { (*hctx).type_ },
-        ) == bindings::hctx_type_HCTX_TYPE_POLL;
-
         // SAFETY: `hctx` is valid as required by this function.
         let queue_data = unsafe { (*(*hctx).queue).queuedata };
 
@@ -235,7 +229,6 @@ impl<T: Operations> OperationsVTable<T> {
             // SAFETY: `bd` is valid as required by the safety requirement for
             // this function.
             unsafe { (*bd).last },
-            is_poll,
         );
 
         if let Err(e) = ret {

-- 
2.51.2



^ permalink raw reply related

* [PATCH v2 08/83] block: rust: add `Request` private data support
From: Andreas Hindborg @ 2026-06-09 19:07 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux, Andreas Hindborg
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

From: Andreas Hindborg <a.hindborg@samsung.com>

C block device drivers can attach private data to a `struct request`. This
data is stored next to the request structure and is part of the request
allocation set up during driver initialization.

Expose this private request data area to Rust block device drivers.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 drivers/block/rnull/rnull.rs       |  5 +++++
 rust/kernel/block/mq.rs            |  6 ++++++
 rust/kernel/block/mq/operations.rs | 26 +++++++++++++++++++++++++-
 rust/kernel/block/mq/request.rs    | 24 +++++++++++++++++++-----
 rust/kernel/block/mq/tag_set.rs    | 24 +++++++++++++++++++-----
 5 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index 69cf62475446..dd7a30519870 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -132,6 +132,11 @@ struct QueueData {
 #[vtable]
 impl Operations for NullBlkDevice {
     type QueueData = KBox<QueueData>;
+    type RequestData = ();
+
+    fn new_request_data() -> impl PinInit<Self::RequestData> {
+        Ok(())
+    }
 
     #[inline(always)]
     fn queue_rq(queue_data: &QueueData, rq: Owned<mq::Request<Self>>, _is_last: bool) -> Result {
diff --git a/rust/kernel/block/mq.rs b/rust/kernel/block/mq.rs
index b8ecd69abe98..7718b106eb49 100644
--- a/rust/kernel/block/mq.rs
+++ b/rust/kernel/block/mq.rs
@@ -69,8 +69,14 @@
 //!
 //! #[vtable]
 //! impl Operations for MyBlkDevice {
+//!     type RequestData = ();
 //!     type QueueData = ();
 //!
+//!     fn new_request_data(
+//!     ) -> impl PinInit<()> {
+//!         Ok(())
+//!     }
+//!
 //!     fn queue_rq(_queue_data: (), rq: Owned<Request<Self>>, _is_last: bool) -> Result {
 //!         rq.end_ok();
 //!         Ok(())
diff --git a/rust/kernel/block/mq/operations.rs b/rust/kernel/block/mq/operations.rs
index bb23a32f3983..c49ca2e8bbb2 100644
--- a/rust/kernel/block/mq/operations.rs
+++ b/rust/kernel/block/mq/operations.rs
@@ -29,6 +29,7 @@
     marker::PhantomData,
     ptr::NonNull, //
 };
+use pin_init::PinInit;
 
 type ForeignBorrowed<'a, T> = <T as ForeignOwnable>::Borrowed<'a>;
 
@@ -44,10 +45,27 @@
 /// [module level documentation]: kernel::block::mq
 #[macros::vtable]
 pub trait Operations: Sized {
+    /// Data associated with a request. This data is located next to the request
+    /// structure.
+    ///
+    /// To be able to handle accessing this data from interrupt context, this
+    /// data must be `Sync`.
+    ///
+    /// Requests may be cleaned up by a thread different from the allocating thread, so
+    /// `RequestData` must be `Send`.
+    ///
+    /// The `RequestData` object is initialized when the requests are allocated
+    /// during queue initialization, and it is are dropped when the requests are
+    /// dropped during queue teardown.
+    type RequestData: Sized + Sync + Send;
+
     /// Data associated with the `struct request_queue` that is allocated for
     /// the `GenDisk` associated with this `Operations` implementation.
     type QueueData: ForeignOwnable + Sync;
 
+    /// Called by the kernel to get an initializer for a `Pin<&mut RequestData>`.
+    fn new_request_data() -> impl PinInit<Self::RequestData>;
+
     /// Called by the kernel to queue a request with the driver. If `is_last` is
     /// `false`, the driver is allowed to defer committing the request.
     fn queue_rq(
@@ -252,6 +270,12 @@ impl<T: Operations> OperationsVTable<T> {
             // it is valid for writes.
             unsafe { RequestDataWrapper::refcount_ptr(pdu.as_ptr()).write(Refcount::new(0)) };
 
+            let initializer = T::new_request_data();
+
+            // SAFETY: `pdu` is a valid pointer as established above. We do not touch `pdu` if
+            // `__pinned_init` returns an error. We promise not to move the pointee of `pdu`.
+            unsafe { initializer.__pinned_init(RequestDataWrapper::data_ptr(pdu.as_ptr()))? };
+
             Ok(0)
         })
     }
@@ -271,7 +295,7 @@ impl<T: Operations> OperationsVTable<T> {
     ) {
         // SAFETY: The tagset invariants guarantee that all requests are allocated with extra memory
         // for the request data.
-        let pdu = unsafe { bindings::blk_mq_rq_to_pdu(rq) }.cast::<RequestDataWrapper>();
+        let pdu = unsafe { bindings::blk_mq_rq_to_pdu(rq) }.cast::<RequestDataWrapper<T>>();
 
         // SAFETY: `pdu` is valid for read and write and is properly initialised.
         unsafe { core::ptr::drop_in_place(pdu) };
diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request.rs
index 7444de3c8522..1882d697dcf3 100644
--- a/rust/kernel/block/mq/request.rs
+++ b/rust/kernel/block/mq/request.rs
@@ -107,12 +107,12 @@ pub fn complete(this: ARef<Self>) {
     ///
     /// - `this` must point to a valid allocation of size at least size of
     ///   [`Self`] plus size of [`RequestDataWrapper`].
-    pub(crate) unsafe fn wrapper_ptr(this: *mut Self) -> NonNull<RequestDataWrapper> {
+    pub(crate) unsafe fn wrapper_ptr(this: *mut Self) -> NonNull<RequestDataWrapper<T>> {
         let request_ptr = this.cast::<bindings::request>();
         // SAFETY: By safety requirements for this function, `this` is a
         // valid allocation.
         let wrapper_ptr =
-            unsafe { bindings::blk_mq_rq_to_pdu(request_ptr).cast::<RequestDataWrapper>() };
+            unsafe { bindings::blk_mq_rq_to_pdu(request_ptr).cast::<RequestDataWrapper<T>>() };
         // SAFETY: By C API contract, `wrapper_ptr` points to a valid allocation
         // and is not null.
         unsafe { NonNull::new_unchecked(wrapper_ptr) }
@@ -120,7 +120,7 @@ pub(crate) unsafe fn wrapper_ptr(this: *mut Self) -> NonNull<RequestDataWrapper>
 
     /// Return a reference to the [`RequestDataWrapper`] stored in the private
     /// area of the request structure.
-    pub(crate) fn wrapper_ref(&self) -> &RequestDataWrapper {
+    pub(crate) fn wrapper_ref(&self) -> &RequestDataWrapper<T> {
         // SAFETY: By type invariant, `self.0` is a valid allocation. Further,
         // the private data associated with this request is initialized and
         // valid. The existence of `&self` guarantees that the private data is
@@ -132,16 +132,19 @@ pub(crate) fn wrapper_ref(&self) -> &RequestDataWrapper {
 /// A wrapper around data stored in the private area of the C [`struct request`].
 ///
 /// [`struct request`]: srctree/include/linux/blk-mq.h
-pub(crate) struct RequestDataWrapper {
+pub(crate) struct RequestDataWrapper<T: Operations> {
     /// The Rust request refcount has the following states:
     ///
     /// - 0: The request is owned by C block layer.
     /// - 1: The request is owned by Rust abstractions but there are no [`ARef`] references to it.
     /// - 2+: There are [`ARef`] references to the request.
     refcount: Refcount,
+
+    /// Driver managed request data
+    data: T::RequestData,
 }
 
-impl RequestDataWrapper {
+impl<T: Operations> RequestDataWrapper<T> {
     /// Return a reference to the refcount of the request that is embedding
     /// `self`.
     pub(crate) fn refcount(&self) -> &Refcount {
@@ -159,6 +162,17 @@ pub(crate) unsafe fn refcount_ptr(this: *mut Self) -> *mut Refcount {
         // field projection is safe.
         unsafe { &raw mut (*this).refcount }
     }
+
+    /// Return a pointer to the `data` field of the `Self` pointed to by `this`.
+    ///
+    /// # Safety
+    ///
+    /// - `this` must point to a live allocation of at least the size of `Self`.
+    pub(crate) unsafe fn data_ptr(this: *mut Self) -> *mut T::RequestData {
+        // SAFETY: Because of the safety requirements of this function, the
+        // field projection is safe.
+        unsafe { &raw mut (*this).data }
+    }
 }
 
 // SAFETY: Exclusive access is thread-safe for `Request`. `Request` has no `&mut
diff --git a/rust/kernel/block/mq/tag_set.rs b/rust/kernel/block/mq/tag_set.rs
index dae9df408a86..ec5cac48b83f 100644
--- a/rust/kernel/block/mq/tag_set.rs
+++ b/rust/kernel/block/mq/tag_set.rs
@@ -8,13 +8,27 @@
 
 use crate::{
     bindings,
-    block::mq::{operations::OperationsVTable, request::RequestDataWrapper, Operations},
-    error::{self, Result},
+    block::mq::{
+        operations::OperationsVTable,
+        request::RequestDataWrapper,
+        Operations, //
+    },
+    error::{
+        self,
+        Result, //
+    },
     prelude::try_pin_init,
     types::Opaque,
 };
-use core::{convert::TryInto, marker::PhantomData};
-use pin_init::{pin_data, pinned_drop, PinInit};
+use core::{
+    convert::TryInto,
+    marker::PhantomData, //
+};
+use pin_init::{
+    pin_data,
+    pinned_drop,
+    PinInit, //
+};
 
 /// A wrapper for the C `struct blk_mq_tag_set`.
 ///
@@ -39,7 +53,7 @@ pub fn new(
         num_maps: u32,
     ) -> impl PinInit<Self, error::Error> {
         let tag_set: bindings::blk_mq_tag_set = pin_init::zeroed();
-        let tag_set: Result<_> = core::mem::size_of::<RequestDataWrapper>()
+        let tag_set: Result<_> = size_of::<RequestDataWrapper<T>>()
             .try_into()
             .map(|cmd_size| {
                 bindings::blk_mq_tag_set {

-- 
2.51.2



^ permalink raw reply related

* [PATCH v2 76/83] block: rust: add `request_timeout` hook
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

Add a hook for the request timeout feature. This allows the kernel to call
into a block device driver when it decides a request has timed out. Rust
block device drivers can now implement `Operations::request_timeout` to
respond to request timeouts.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 rust/kernel/block.rs               |  1 +
 rust/kernel/block/mq.rs            |  3 +-
 rust/kernel/block/mq/operations.rs | 78 +++++++++++++++++++++++++++++++++++++-
 rust/kernel/block/mq/tag_set.rs    |  1 -
 4 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/rust/kernel/block.rs b/rust/kernel/block.rs
index b3578f28871a..23795dbe08c3 100644
--- a/rust/kernel/block.rs
+++ b/rust/kernel/block.rs
@@ -42,6 +42,7 @@ macro_rules! declare_err {
         declare_err!(BLK_STS_NOTSUPP, "Operation not supported.");
         declare_err!(BLK_STS_IOERR, "Generic IO error.");
         declare_err!(BLK_STS_DEV_RESOURCE, "Device resource busy. Retry later.");
+        declare_err!(BLK_STS_TIMEOUT, "Operation timed out.");
     }
 
     /// A wrapper around a 1 byte block layer error code.
diff --git a/rust/kernel/block/mq.rs b/rust/kernel/block/mq.rs
index 47e1f860c6ba..a306181d88ce 100644
--- a/rust/kernel/block/mq.rs
+++ b/rust/kernel/block/mq.rs
@@ -138,7 +138,8 @@
 };
 pub use operations::{
     IoCompletionBatch,
-    Operations, //
+    Operations,
+    RequestTimeoutStatus, //
 };
 pub use request::{
     Command,
diff --git a/rust/kernel/block/mq/operations.rs b/rust/kernel/block/mq/operations.rs
index d28af9a5e006..2b340675f976 100644
--- a/rust/kernel/block/mq/operations.rs
+++ b/rust/kernel/block/mq/operations.rs
@@ -151,6 +151,51 @@ fn report_zones(
     fn map_queues(_tag_set: Pin<&mut TagSet<Self>>) {
         build_error!(crate::error::VTABLE_DEFAULT_ERROR)
     }
+
+    /// Called by the kernel when a request has been queued with the driver for too long.
+    ///
+    /// We identify the request by `queue_id` and `tag` as we cannot pass
+    /// `Owned<Request>` or `ARef<Request>`. The driver may hold either of these
+    /// already.
+    ///
+    /// A driver can use [`TagSet::tag_to_rq`] to try to obtain a request reference.
+    ///
+    /// A driver must return [`RequestTimeoutStatus::Completed`] if the request
+    /// was completed during the call. Otherwise
+    /// [`RequestTimeoutStatus::RetryLater`] must be returned, and the kernel
+    /// will retry the call later.
+    fn request_timeout(_tag_set: &TagSet<Self>, _queue_id: u32, _tag: u32) -> RequestTimeoutStatus {
+        build_error!(crate::error::VTABLE_DEFAULT_ERROR)
+    }
+}
+
+/// Return value for [`Operations::request_timeout`].
+#[repr(u32)]
+pub enum RequestTimeoutStatus {
+    /// The request was completed.
+    Completed = bindings::blk_eh_timer_return_BLK_EH_DONE,
+
+    /// The request is still processing, retry later.
+    RetryLater = bindings::blk_eh_timer_return_BLK_EH_RESET_TIMER,
+}
+
+impl RequestTimeoutStatus {
+    /// Create a [`RequestTimeoutStatus`] from an integer.
+    ///
+    /// # SAFETY
+    ///
+    /// - `value` must be one of the enum values declared for [`bindings::blk_eh_timer_return`].
+    pub unsafe fn from_raw(value: u32) -> Self {
+        // SAFETY: By function safety requirements, value is usable as `Self`.
+        unsafe { core::mem::transmute(value) }
+    }
+}
+
+impl From<RequestTimeoutStatus> for u32 {
+    fn from(value: RequestTimeoutStatus) -> Self {
+        // SAFETY: All `RequestTimeoutStatus` representations are valid as `u32`.
+        unsafe { core::mem::transmute(value) }
+    }
 }
 
 /// A vtable for blk-mq to interact with a block device driver.
@@ -521,6 +566,33 @@ impl<T: Operations> OperationsVTable<T> {
         T::map_queues(tag_set);
     }
 
+    /// This function is called by the block layer when a request has been
+    /// queued with the driver for too long.
+    ///
+    /// # Safety
+    ///
+    /// - This function may only be called by blk-mq C infrastructure.
+    /// - `rq` must point to an initialized and valid `Request`.
+    unsafe extern "C" fn request_timeout_callback(
+        rq: *mut bindings::request,
+    ) -> bindings::blk_eh_timer_return {
+        // SAFETY: `rq` is valid and initialized.
+        let hctx = unsafe { (*rq).mq_hctx };
+        // SAFETY: `rq` is valid and initialized, so `hctx` is also valid and initialized.
+        let qid = unsafe { (*hctx).queue_num };
+        // SAFETY: `rq` is valid and initialized.
+        let tag = unsafe { (*rq).tag } as u32;
+        // SAFETY: `rq` is valid and initialized, so `hctx` is also valid and initialized.
+        let queue = unsafe { (*hctx).queue };
+        // SAFETY: `rq` is valid and initialized, so is `queue`.
+        let tag_set = unsafe { (*queue).tag_set };
+        // SAFETY: As `rq` is valid, so is `tag_set`. We never create mutable references to a
+        // `TagSet` without proper locking.
+        let tag_set: &TagSet<T> = unsafe { TagSet::from_ptr(tag_set) };
+
+        T::request_timeout(tag_set, qid, tag).into()
+    }
+
     const VTABLE: bindings::blk_mq_ops = bindings::blk_mq_ops {
         queue_rq: Some(Self::queue_rq_callback),
         queue_rqs: if T::HAS_QUEUE_RQS {
@@ -533,7 +605,11 @@ impl<T: Operations> OperationsVTable<T> {
         put_budget: None,
         set_rq_budget_token: None,
         get_rq_budget_token: None,
-        timeout: None,
+        timeout: if T::HAS_REQUEST_TIMEOUT {
+            Some(Self::request_timeout_callback)
+        } else {
+            None
+        },
         poll: if T::HAS_POLL {
             Some(Self::poll_callback)
         } else {
diff --git a/rust/kernel/block/mq/tag_set.rs b/rust/kernel/block/mq/tag_set.rs
index 66b6a30a9e66..6d3882c01d9d 100644
--- a/rust/kernel/block/mq/tag_set.rs
+++ b/rust/kernel/block/mq/tag_set.rs
@@ -126,7 +126,6 @@ pub fn flags(&self) -> Flags {
     /// `ptr` must be a pointer to a valid and initialized `TagSet<T>`. There
     /// may be no other mutable references to the tag set. The pointee must be
     /// live and valid at least for the duration of the returned lifetime `'a`.
-    #[expect(dead_code)]
     pub(crate) unsafe fn from_ptr<'a>(ptr: *mut bindings::blk_mq_tag_set) -> &'a Self {
         // SAFETY: By the safety requirements of this function, `ptr` is valid
         // for use as a reference for the duration of `'a`.

-- 
2.51.2



^ permalink raw reply related

* [PATCH v2 35/83] block: rnull: add volatile cache emulation
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

Add volatile cache emulation to rnull. When enabled via the
`cache_size_mib` configfs attribute, writes are first stored in a volatile
cache before being written back to the simulated non-volatile storage.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 drivers/block/rnull/configfs.rs          |  35 +++-
 drivers/block/rnull/disk_storage.rs      | 260 +++++++++++++++++++++++++
 drivers/block/rnull/disk_storage/page.rs |  77 ++++++++
 drivers/block/rnull/rnull.rs             | 314 ++++++++++++++++++-------------
 4 files changed, 545 insertions(+), 141 deletions(-)

diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index 0e9fe8cdc07f..504bb477c2d0 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -1,9 +1,11 @@
 // SPDX-License-Identifier: GPL-2.0
 
 use super::{
+    DiskStorage,
     NullBlkDevice,
     THIS_MODULE, //
 };
+use core::fmt::Write;
 use kernel::{
     bindings,
     block::{
@@ -18,10 +20,7 @@
         AttributeOperations, //
     },
     configfs_attrs,
-    fmt::{
-        self,
-        Write as _, //
-    },
+    fmt,
     new_mutex,
     page::PAGE_SIZE,
     prelude::*,
@@ -104,17 +103,19 @@ fn make_group(
                 badblocks: 12,
                 badblocks_once: 13,
                 badblocks_partial_io: 14,
+                cache_size_mib: 15,
             ],
         };
 
+        let block_size = 4096;
         Ok(configfs::Group::new(
             name.try_into()?,
             item_type,
             // TODO: cannot coerce new_mutex!() to impl PinInit<_, Error>, so put mutex inside
-            try_pin_init!( DeviceConfig {
+            try_pin_init!(DeviceConfig {
                 data <- new_mutex!(DeviceConfigInner {
                     powered: false,
-                    block_size: 4096,
+                    block_size,
                     rotational: false,
                     disk: None,
                     capacity_mib: 4096,
@@ -129,6 +130,11 @@ fn make_group(
                     bad_blocks: Arc::pin_init(BadBlocks::new(false), GFP_KERNEL)?,
                     bad_blocks_once: false,
                     bad_blocks_partial_io: false,
+                    disk_storage: Arc::pin_init(
+                        DiskStorage::new(0, block_size as usize),
+                        GFP_KERNEL
+                    )?,
+                    cache_size_mib: 0,
                 }),
             }),
             core::iter::empty(),
@@ -201,6 +207,8 @@ struct DeviceConfigInner {
     bad_blocks: Arc<BadBlocks>,
     bad_blocks_once: bool,
     bad_blocks_partial_io: bool,
+    cache_size_mib: u64,
+    disk_storage: Arc<DiskStorage>,
 }
 
 #[vtable]
@@ -239,6 +247,7 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
                 bad_blocks: guard.bad_blocks.clone(),
                 bad_blocks_once: guard.bad_blocks_once,
                 bad_blocks_partial_io: guard.bad_blocks_partial_io,
+                storage: guard.disk_storage.clone(),
             })?);
             guard.powered = true;
         } else if guard.powered && !power_op {
@@ -250,6 +259,7 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
     }
 }
 
+// DiskStorage::new(cache_size_mib << 20, block_size as usize),
 configfs_simple_field!(DeviceConfig, 1, block_size, u32, check GenDiskBuilder::validate_block_size);
 configfs_simple_bool_field!(DeviceConfig, 2, rotational);
 configfs_simple_field!(DeviceConfig, 3, capacity_mib, u64);
@@ -394,3 +404,16 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
 
 configfs_simple_bool_field!(DeviceConfig, 13, bad_blocks_once);
 configfs_simple_bool_field!(DeviceConfig, 14, bad_blocks_partial_io);
+configfs_attribute!(DeviceConfig, 15,
+    show: |this, page| show_field(this.data.lock().cache_size_mib, page),
+    store: |this, page| store_with_power_check(this, page, |data, page| {
+        let text = core::str::from_utf8(page)?.trim();
+        let value = text.parse::<u64>().map_err(|_| EINVAL)?;
+        data.disk_storage = Arc::pin_init(
+            DiskStorage::new(value, data.block_size as usize),
+            GFP_KERNEL
+        )?;
+        data.cache_size_mib = value;
+        Ok(())
+    })
+);
diff --git a/drivers/block/rnull/disk_storage.rs b/drivers/block/rnull/disk_storage.rs
new file mode 100644
index 000000000000..b8fef411fffe
--- /dev/null
+++ b/drivers/block/rnull/disk_storage.rs
@@ -0,0 +1,260 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use super::HwQueueContext;
+use core::pin::Pin;
+use kernel::{
+    block,
+    new_spinlock,
+    new_xarray,
+    page::PAGE_SIZE,
+    prelude::*,
+    sync::{
+        atomic::{ordering, Atomic},
+        SpinLock, SpinLockGuard,
+    },
+    uapi::PAGE_SECTORS,
+    xarray::{
+        self,
+        XArray,
+        XArraySheaf, //
+    }, //
+};
+pub(crate) use page::NullBlockPage;
+
+mod page;
+
+#[pin_data]
+pub(crate) struct DiskStorage {
+    // TODO: Get rid of this pointer indirection.
+    #[pin]
+    trees: SpinLock<Pin<KBox<TreeContainer>>>,
+    cache_size: u64,
+    cache_size_used: Atomic<u64>,
+    next_flush_sector: Atomic<u64>,
+    block_size: usize,
+}
+
+impl DiskStorage {
+    pub(crate) fn new(cache_size: u64, block_size: usize) -> impl PinInit<Self, Error> {
+        try_pin_init!( Self {
+            // TODO: Get rid of the box
+            // https://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git/commit/?h=locking&id=a5d84cafb3e253a11d2e078902c5b090be2f4227
+            trees <- new_spinlock!(KBox::pin_init(TreeContainer::new(), GFP_KERNEL)?),
+            cache_size,
+            cache_size_used: Atomic::new(0),
+            next_flush_sector: Atomic::new(0),
+            block_size
+        })
+    }
+
+    pub(crate) fn access<'a, 'b, 'c>(
+        &'a self,
+        tree_guard: &'a mut SpinLockGuard<'b, Pin<KBox<TreeContainer>>>,
+        hw_data_guard: &'a mut SpinLockGuard<'b, HwQueueContext>,
+        sheaf: Option<XArraySheaf<'c>>,
+    ) -> DiskStorageAccess<'a, 'b, 'c> {
+        DiskStorageAccess::new(self, tree_guard, hw_data_guard, sheaf)
+    }
+
+    pub(crate) fn lock(&self) -> SpinLockGuard<'_, Pin<KBox<TreeContainer>>> {
+        self.trees.lock()
+    }
+}
+
+pub(crate) struct DiskStorageAccess<'a, 'b, 'c> {
+    cache_guard: xarray::Guard<'a, TreeNode>,
+    disk_guard: xarray::Guard<'a, TreeNode>,
+    hw_data_guard: &'a mut SpinLockGuard<'b, HwQueueContext>,
+    disk_storage: &'a DiskStorage,
+    pub(crate) sheaf: Option<XArraySheaf<'c>>,
+}
+
+impl<'a, 'b, 'c> DiskStorageAccess<'a, 'b, 'c> {
+    fn new(
+        disk_storage: &'a DiskStorage,
+        tree_guard: &'a mut SpinLockGuard<'b, Pin<KBox<TreeContainer>>>,
+        hw_data_guard: &'a mut SpinLockGuard<'b, HwQueueContext>,
+        sheaf: Option<XArraySheaf<'c>>,
+    ) -> Self {
+        Self {
+            cache_guard: tree_guard.cache_tree.lock(),
+            disk_guard: tree_guard.disk_tree.lock(),
+            hw_data_guard,
+            disk_storage,
+            sheaf,
+        }
+    }
+    fn to_index(sector: u64) -> usize {
+        // CAST: Device size limited during setup to (2^32)-1 on 32 bit systems.
+        (sector >> block::PAGE_SECTORS_SHIFT) as usize
+    }
+
+    fn to_sector(index: usize) -> u64 {
+        // CAST: Casting from `usize` to `u64` never overflows.
+        (index << block::PAGE_SECTORS_SHIFT) as u64
+    }
+
+    fn extract_cache_page_inner<'g>(
+        cache_guard: &mut xarray::Guard<'g, TreeNode>,
+        disk_guard: &mut xarray::Guard<'g, TreeNode>,
+        disk_storage: &DiskStorage,
+        hw_data: &mut HwQueueContext,
+        sheaf: Option<&mut XArraySheaf<'_>>,
+    ) -> Result<KBox<NullBlockPage>> {
+        let cache_entry = cache_guard
+            .find_next_entry_circular(
+                disk_storage.next_flush_sector.load(ordering::Relaxed) as usize
+            )
+            .expect("Expected to find a page in the cache");
+
+        let index = cache_entry.index();
+
+        disk_storage
+            .next_flush_sector
+            .store(Self::to_sector(index).wrapping_add(1), ordering::Relaxed);
+
+        disk_storage.cache_size_used.store(
+            disk_storage.cache_size_used.load(ordering::Relaxed) - PAGE_SIZE as u64,
+            ordering::Relaxed,
+        );
+
+        let page = match disk_guard.entry(index) {
+            xarray::Entry::Vacant(disk_entry) => {
+                disk_entry
+                    .insert(cache_entry.remove(), sheaf)
+                    .expect("Preload is set up to allow insert without failure");
+                hw_data.page.take().expect("Preload has allocated for us")
+            }
+            xarray::Entry::Occupied(mut disk_entry) => {
+                let mut page = if cache_entry.is_full() {
+                    disk_entry.insert(cache_entry.remove())
+                } else {
+                    let mut src = cache_entry;
+                    let mut offset = 0;
+                    for _ in 0..PAGE_SECTORS {
+                        src.page_mut().as_pin_mut().copy_to_page(
+                            disk_entry.page_mut().as_pin_mut(),
+                            offset,
+                            block::SECTOR_SIZE as usize,
+                        )?;
+                        offset += block::SECTOR_SIZE as usize;
+                    }
+                    src.remove()
+                };
+                page.reset();
+                page
+            }
+        };
+
+        Ok(page)
+    }
+
+    fn get_cache_page(&mut self, sector: u64) -> Result<&mut NullBlockPage> {
+        let index = Self::to_index(sector);
+
+        match self.cache_guard.entry(index) {
+            xarray::Entry::Occupied(occupied_entry) => Ok(occupied_entry.into_mut()),
+            xarray::Entry::Vacant(vacant_entry) => {
+                let cache_guard = vacant_entry.into_guard();
+                let page = if self.disk_storage.cache_size_used.load(ordering::Relaxed)
+                    < self.disk_storage.cache_size
+                {
+                    self.hw_data_guard
+                        .page
+                        .take()
+                        .expect("Expected to have a page available")
+                } else {
+                    Self::extract_cache_page_inner(
+                        cache_guard,
+                        &mut self.disk_guard,
+                        self.disk_storage,
+                        self.hw_data_guard,
+                        self.sheaf.as_mut(),
+                    )?
+                };
+                let xarray::Entry::Vacant(vacant_entry) = cache_guard.entry(index) else {
+                    unreachable!("slot was vacant and we hold the lock")
+                };
+                Ok(vacant_entry
+                    .insert(page, self.sheaf.as_mut())
+                    .expect("Should be able to insert"))
+            }
+        }
+    }
+
+    fn get_disk_page(&mut self, sector: u64) -> Result<&mut NullBlockPage> {
+        let index = Self::to_index(sector);
+
+        let page = match self.disk_guard.entry(index) {
+            xarray::Entry::Vacant(e) => e.insert(
+                self.hw_data_guard
+                    .page
+                    .take()
+                    .expect("Expected page to be available"),
+                self.sheaf.as_mut(),
+            )?,
+            xarray::Entry::Occupied(e) => e.into_mut(),
+        };
+
+        Ok(page)
+    }
+
+    pub(crate) fn get_write_page(&mut self, sector: u64) -> Result<&mut NullBlockPage> {
+        let page = if self.disk_storage.cache_size > 0 {
+            self.get_cache_page(sector)?
+        } else {
+            self.get_disk_page(sector)?
+        };
+
+        Ok(page)
+    }
+
+    pub(crate) fn get_read_page(&self, sector: u64) -> Option<&NullBlockPage> {
+        let index = Self::to_index(sector);
+        if self.disk_storage.cache_size > 0 {
+            self.cache_guard
+                .get(index)
+                .or_else(|| self.disk_guard.get(index))
+        } else {
+            self.disk_guard.get(index)
+        }
+    }
+
+    fn free_sector_tree(tree_access: &mut xarray::Guard<'_, TreeNode>, sector: u64) {
+        let index = Self::to_index(sector);
+        if let Some(page) = tree_access.get_mut(index) {
+            page.set_free(sector);
+
+            if page.is_empty() {
+                tree_access.remove(index);
+            }
+        }
+    }
+
+    pub(crate) fn free_sector(&mut self, sector: u64) {
+        if self.disk_storage.cache_size > 0 {
+            Self::free_sector_tree(&mut self.cache_guard, sector);
+        }
+
+        Self::free_sector_tree(&mut self.disk_guard, sector);
+    }
+}
+
+type TreeNode = KBox<NullBlockPage>;
+
+#[pin_data]
+pub(crate) struct TreeContainer {
+    #[pin]
+    disk_tree: XArray<TreeNode>,
+    #[pin]
+    cache_tree: XArray<TreeNode>,
+}
+
+impl TreeContainer {
+    fn new() -> impl PinInit<Self> {
+        pin_init!(TreeContainer {
+            disk_tree <- new_xarray!(xarray::AllocKind::Alloc),
+            cache_tree <- new_xarray!(xarray::AllocKind::Alloc),
+        })
+    }
+}
diff --git a/drivers/block/rnull/disk_storage/page.rs b/drivers/block/rnull/disk_storage/page.rs
new file mode 100644
index 000000000000..bc78973ad5d4
--- /dev/null
+++ b/drivers/block/rnull/disk_storage/page.rs
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use kernel::{
+    block::{
+        PAGE_SECTOR_MASK,
+        SECTOR_SHIFT, //
+    },
+    memalloc_scope,
+    page::{
+        SafePage,
+        PAGE_SIZE, //
+    },
+    prelude::*,
+    types::Owned,
+    uapi::PAGE_SECTORS, //
+};
+
+static_assert!((PAGE_SIZE >> SECTOR_SHIFT) <= 64);
+
+pub(crate) struct NullBlockPage {
+    page: Owned<SafePage>,
+    status: u64,
+    block_size: usize,
+}
+
+impl NullBlockPage {
+    pub(crate) fn new(block_size: usize) -> Result<KBox<Self>> {
+        memalloc_scope!(let _noio: NoIo);
+        Ok(KBox::new(
+            Self {
+                page: SafePage::alloc_page(__GFP_ZERO)?,
+                status: 0,
+                block_size,
+            },
+            GFP_KERNEL,
+        )?)
+    }
+
+    pub(crate) fn set_occupied(&mut self, sector: u64) {
+        let idx = sector & u64::from(PAGE_SECTOR_MASK);
+        self.status |= 1 << idx;
+    }
+
+    pub(crate) fn set_free(&mut self, sector: u64) {
+        let idx = sector & u64::from(PAGE_SECTOR_MASK);
+        self.status &= !(1 << idx);
+    }
+
+    pub(crate) fn is_empty(&self) -> bool {
+        self.status == 0
+    }
+
+    pub(crate) fn reset(&mut self) {
+        self.status = 0;
+    }
+
+    pub(crate) fn is_full(&self) -> bool {
+        let blocks_per_page = PAGE_SIZE >> self.block_size.trailing_zeros();
+        let shift = PAGE_SECTORS as usize / blocks_per_page;
+
+        for i in 0..blocks_per_page {
+            if self.status & (1 << (i * shift)) == 0 {
+                return false;
+            }
+        }
+
+        true
+    }
+
+    pub(crate) fn page_mut(&mut self) -> &mut Owned<SafePage> {
+        &mut self.page
+    }
+
+    pub(crate) fn page(&self) -> &Owned<SafePage> {
+        &self.page
+    }
+}
diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index 0c1bc2f5ae9c..877683dba0ac 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -3,13 +3,22 @@
 //! This is a Rust implementation of the C null block driver.
 
 mod configfs;
+mod disk_storage;
 
 use configfs::IRQMode;
+use disk_storage::{
+    DiskStorage,
+    NullBlockPage,
+    TreeContainer, //
+};
 use kernel::{
     bindings,
     block::{
         self,
-        badblocks::{self, BadBlocks},
+        badblocks::{
+            self,
+            BadBlocks, //
+        },
         bio::Segment,
         mq::{
             self,
@@ -20,7 +29,7 @@
             Operations,
             TagSet, //
         },
-        PAGE_SECTOR_MASK, SECTOR_SHIFT,
+        SECTOR_SHIFT,
     },
     error::{
         code,
@@ -29,11 +38,7 @@
     ffi,
     memalloc_scope,
     new_mutex,
-    new_xarray,
-    page::{
-        SafePage,
-        PAGE_SIZE, //
-    },
+    new_spinlock,
     pr_info,
     prelude::*,
     str::CString,
@@ -42,9 +47,11 @@
         atomic::{
             ordering,
             Atomic, //
-        },
+        }, //
         Arc,
-        Mutex, //
+        Mutex,
+        SpinLock,
+        SpinLockGuard,
     },
     time::{
         hrtimer::{
@@ -59,7 +66,7 @@
         OwnableRefCounted,
         Owned, //
     },
-    xarray::XArray, //
+    xarray::XArraySheaf, //
 };
 
 module! {
@@ -146,9 +153,11 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
                 } else {
                     module_parameters::submit_queues.value()
                 };
+
+                let block_size = module_parameters::bs.value();
                 let disk = NullBlkDevice::new(NullBlkOptions {
                     name: &name,
-                    block_size: module_parameters::bs.value(),
+                    block_size,
                     rotational: module_parameters::rotational.value(),
                     capacity_mib: module_parameters::gb.value() * 1024,
                     irq_mode: module_parameters::irqmode.value().try_into()?,
@@ -161,6 +170,7 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
                     bad_blocks: Arc::pin_init(BadBlocks::new(false), GFP_KERNEL)?,
                     bad_blocks_once: false,
                     bad_blocks_partial_io: false,
+                    storage: Arc::pin_init(DiskStorage::new(0, block_size as usize), GFP_KERNEL)?,
                 })?;
                 disks.push(disk, GFP_KERNEL)?;
             }
@@ -190,8 +200,20 @@ struct NullBlkOptions<'a> {
     bad_blocks: Arc<BadBlocks>,
     bad_blocks_once: bool,
     bad_blocks_partial_io: bool,
+    storage: Arc<DiskStorage>,
+}
+
+#[pin_data]
+struct NullBlkDevice {
+    storage: Arc<DiskStorage>,
+    irq_mode: IRQMode,
+    completion_time: Delta,
+    memory_backed: bool,
+    block_size: usize,
+    bad_blocks: Arc<BadBlocks>,
+    bad_blocks_once: bool,
+    bad_blocks_partial_io: bool,
 }
-struct NullBlkDevice;
 
 impl NullBlkDevice {
     fn new(options: NullBlkOptions<'_>) -> Result<GenDisk<Self>> {
@@ -210,6 +232,7 @@ fn new(options: NullBlkOptions<'_>) -> Result<GenDisk<Self>> {
             bad_blocks,
             bad_blocks_once,
             bad_blocks_partial_io,
+            storage,
         } = options;
 
         let mut flags = mq::tag_set::Flags::default();
@@ -244,13 +267,13 @@ fn new(options: NullBlkOptions<'_>) -> Result<GenDisk<Self>> {
             GFP_KERNEL,
         )?;
 
-        let queue_data = Box::pin_init(
-            pin_init!(QueueData {
-                tree <- new_xarray!(kernel::xarray::AllocKind::Alloc),
+        let queue_data = Box::try_pin_init(
+            try_pin_init!(Self {
+                storage,
                 irq_mode,
                 completion_time,
                 memory_backed,
-                block_size: block_size.into(),
+                block_size: block_size as usize,
                 bad_blocks,
                 bad_blocks_once,
                 bad_blocks_partial_io,
@@ -273,22 +296,68 @@ fn new(options: NullBlkOptions<'_>) -> Result<GenDisk<Self>> {
         builder.build(fmt!("{}", name.to_str()?), tagset, queue_data)
     }
 
+    fn sheaf_size() -> usize {
+        2 * ((usize::BITS as usize / bindings::XA_CHUNK_SHIFT)
+            + if (usize::BITS as usize % bindings::XA_CHUNK_SHIFT) == 0 {
+                0
+            } else {
+                1
+            })
+    }
+
+    fn preload<'b, 'c>(
+        tree_guard: &'b mut SpinLockGuard<'c, Pin<KBox<TreeContainer>>>,
+        hw_data_guard: &'b mut SpinLockGuard<'c, HwQueueContext>,
+        block_size: usize,
+        sheaf: &'b mut Option<XArraySheaf<'c>>,
+    ) -> Result {
+        match sheaf {
+            Some(sheaf) => {
+                tree_guard.do_unlocked(|| {
+                    hw_data_guard.do_unlocked(|| sheaf.refill(GFP_KERNEL, Self::sheaf_size()))
+                })?;
+            }
+            None => {
+                let _ = sheaf.insert(
+                    kernel::xarray::xarray_kmem_cache()
+                        .sheaf(Self::sheaf_size(), GFP_NOWAIT)
+                        .or(tree_guard.do_unlocked(|| {
+                            hw_data_guard.do_unlocked(|| -> Result<_> {
+                                kernel::xarray::xarray_kmem_cache()
+                                    .sheaf(Self::sheaf_size(), GFP_KERNEL)
+                            })
+                        }))?,
+                );
+            }
+        }
+
+        // Another thread may get the lock after we allocate. If this happens, retry.
+        while hw_data_guard.page.is_none() {
+            hw_data_guard.page =
+                Some(tree_guard.do_unlocked(|| {
+                    hw_data_guard.do_unlocked(|| NullBlockPage::new(block_size))
+                })?);
+        }
+
+        Ok(())
+    }
+
     #[inline(always)]
-    fn write(tree: &XArray<TreeNode>, mut sector: u64, mut segment: Segment<'_>) -> Result {
-        while !segment.is_empty() {
-            let page = NullBlockPage::new()?;
-            let mut tree = tree.lock();
+    fn write<'a, 'b, 'c>(
+        &'a self,
+        tree_guard: &'b mut SpinLockGuard<'c, Pin<KBox<TreeContainer>>>,
+        hw_data_guard: &'b mut SpinLockGuard<'c, HwQueueContext>,
+        mut sector: u64,
+        mut segment: Segment<'_>,
+    ) -> Result {
+        let mut sheaf: Option<XArraySheaf<'_>> = None;
 
-            // CAST: Device size limited during setup to (2^32)-1 on 32 bit systems.
-            let page_idx = (sector >> block::PAGE_SECTORS_SHIFT) as usize;
+        while !segment.is_empty() {
+            Self::preload(tree_guard, hw_data_guard, self.block_size, &mut sheaf)?;
 
-            let page = if let Some(page) = tree.get_mut(page_idx) {
-                page
-            } else {
-                tree.store(page_idx, page, GFP_KERNEL)?;
-                tree.get_mut(page_idx).unwrap()
-            };
+            let mut access = self.storage.access(tree_guard, hw_data_guard, sheaf);
 
+            let page = access.get_write_page(sector)?;
             page.set_occupied(sector);
 
             // CAST: Page offset always fits in 32 bits.
@@ -296,58 +365,73 @@ fn write(tree: &XArray<TreeNode>, mut sector: u64, mut segment: Segment<'_>) ->
                 ((sector & u64::from(block::PAGE_SECTOR_MASK)) << block::SECTOR_SHIFT) as usize;
 
             // CAST: Casting from `usize` to `u64` never overflows.
-            sector += segment.copy_to_page(page.page.as_pin_mut(), page_offset) as u64
+            sector += segment.copy_to_page(page.page_mut().as_pin_mut(), page_offset) as u64
                 >> block::SECTOR_SHIFT;
+
+            sheaf = access.sheaf;
         }
+
+        if let Some(sheaf) = sheaf {
+            tree_guard.do_unlocked(|| {
+                hw_data_guard.do_unlocked(|| {
+                    sheaf.return_refill(GFP_KERNEL);
+                })
+            });
+        }
+
         Ok(())
     }
 
     #[inline(always)]
-    fn read(tree: &XArray<TreeNode>, mut sector: u64, mut segment: Segment<'_>) -> Result {
-        let tree = tree.lock();
+    fn read<'a, 'b, 'c>(
+        &'a self,
+        tree_guard: &'b mut SpinLockGuard<'c, Pin<KBox<TreeContainer>>>,
+        hw_data_guard: &'b mut SpinLockGuard<'c, HwQueueContext>,
+        mut sector: u64,
+        mut segment: Segment<'_>,
+    ) -> Result {
+        let access = self.storage.access(tree_guard, hw_data_guard, None);
 
         while !segment.is_empty() {
-            // CAST: Device size limited during setup to (2^32)-1 on 32 bit systems.
-            let page_idx = (sector >> block::PAGE_SECTORS_SHIFT) as usize;
+            let page = access.get_read_page(sector);
 
-            if let Some(page) = tree.get(page_idx) {
-                // CAST: Page offset always fits in 32 bits.
-                let page_offset =
-                    ((sector & u64::from(block::PAGE_SECTOR_MASK)) << block::SECTOR_SHIFT) as usize;
+            match page {
+                Some(page) => {
+                    // CAST: Page offset always fits in 32 bits.
+                    let page_offset = ((sector & u64::from(block::PAGE_SECTOR_MASK))
+                        << block::SECTOR_SHIFT) as usize;
 
+                    // CAST: Casting from `usize` to `u64` never overflows.
+                    sector += segment.copy_from_page(page.page(), page_offset) as u64
+                        >> block::SECTOR_SHIFT;
+                }
                 // CAST: Casting from `usize` to `u64` never overflows.
-                sector +=
-                    segment.copy_from_page(&page.page, page_offset) as u64 >> block::SECTOR_SHIFT;
-            } else {
-                // CAST: Casting from `usize` to `u64` never overflows.
-                sector += segment.zero_page() as u64 >> block::SECTOR_SHIFT;
+                None => sector += segment.zero_page() as u64 >> block::SECTOR_SHIFT,
             }
         }
 
         Ok(())
     }
 
-    fn discard(tree: &XArray<TreeNode>, mut sector: u64, sectors: u64, block_size: u64) -> Result {
-        let mut remaining_bytes = sectors << SECTOR_SHIFT;
-        let mut tree = tree.lock();
+    fn discard(
+        &self,
+        hw_data: &Pin<&SpinLock<HwQueueContext>>,
+        mut sector: u64,
+        sectors: u32,
+    ) -> Result {
+        let mut tree_guard = self.storage.lock();
+        let mut hw_data_guard = hw_data.lock();
 
-        while remaining_bytes > 0 {
-            // CAST: Device size limited during setup to (2^32)-1 on 32 bit systems.
-            let page_idx = (sector >> block::PAGE_SECTORS_SHIFT) as usize;
-            let mut remove = false;
-            if let Some(page) = tree.get_mut(page_idx) {
-                page.set_free(sector);
-                if page.is_empty() {
-                    remove = true;
-                }
-            }
+        let mut access = self
+            .storage
+            .access(&mut tree_guard, &mut hw_data_guard, None);
 
-            if remove {
-                drop(tree.remove(page_idx))
-            }
+        let mut remaining_bytes = (sectors as usize) << SECTOR_SHIFT;
 
-            let processed = remaining_bytes.min(block_size);
-            sector += processed >> SECTOR_SHIFT;
+        while remaining_bytes > 0 {
+            access.free_sector(sector);
+            let processed = remaining_bytes.min(self.block_size);
+            sector += (processed >> SECTOR_SHIFT) as u64;
             remaining_bytes -= processed;
         }
 
@@ -356,14 +440,19 @@ fn discard(tree: &XArray<TreeNode>, mut sector: u64, sectors: u64, block_size: u
 
     #[inline(never)]
     fn transfer(
+        &self,
+        hw_data: &Pin<&SpinLock<HwQueueContext>>,
         rq: &mut Owned<mq::Request<Self>>,
-        tree: &XArray<TreeNode>,
         max_sectors: u32,
     ) -> Result {
         let mut sector = rq.sector();
         let max_end_sector = sector + <u32 as Into<u64>>::into(max_sectors);
         let command = rq.command();
 
+        // TODO: Use `PerCpu` to get rid of this lock
+        let mut hw_data_guard = hw_data.lock();
+        let mut tree_guard = self.storage.lock();
+
         for bio in rq.bio_iter_mut() {
             let segment_iter = bio.segment_iter();
             for mut segment in segment_iter {
@@ -373,8 +462,12 @@ fn transfer(
                 let length_sectors_allowed = segment_length_sectors.min(max_remaining_sectors);
                 segment.truncate(length_sectors_allowed << SECTOR_SHIFT);
                 match command {
-                    bindings::req_op_REQ_OP_WRITE => Self::write(tree, sector, segment)?,
-                    bindings::req_op_REQ_OP_READ => Self::read(tree, sector, segment)?,
+                    bindings::req_op_REQ_OP_WRITE => {
+                        self.write(&mut tree_guard, &mut hw_data_guard, sector, segment)?
+                    }
+                    bindings::req_op_REQ_OP_READ => {
+                        self.read(&mut tree_guard, &mut hw_data_guard, sector, segment)?
+                    }
                     _ => (),
                 }
                 sector += u64::from(length_sectors_allowed);
@@ -384,29 +477,26 @@ fn transfer(
                 }
             }
         }
+
         Ok(())
     }
 
-    fn handle_bad_blocks(
-        rq: &mut Owned<mq::Request<Self>>,
-        queue_data: &QueueData,
-        sectors: &mut u32,
-    ) -> Result {
-        if queue_data.bad_blocks.enabled() {
+    fn handle_bad_blocks(&self, rq: &mut Owned<mq::Request<Self>>, sectors: &mut u32) -> Result {
+        if self.bad_blocks.enabled() {
             let start = rq.sector();
             let end = start + u64::from(*sectors);
-            match queue_data.bad_blocks.check(start..end) {
+            match self.bad_blocks.check(start..end) {
                 badblocks::BlockStatus::None => {}
                 badblocks::BlockStatus::Acknowledged(mut range)
                 | badblocks::BlockStatus::Unacknowledged(mut range) => {
                     rq.data_ref().error.store(1, ordering::Relaxed);
 
-                    if queue_data.bad_blocks_once {
-                        queue_data.bad_blocks.set_good(range.clone())?;
+                    if self.bad_blocks_once {
+                        self.bad_blocks.set_good(range.clone())?;
                     }
 
-                    if queue_data.bad_blocks_partial_io {
-                        let block_size_sectors = queue_data.block_size >> SECTOR_SHIFT;
+                    if self.bad_blocks_partial_io {
+                        let block_size_sectors = (self.block_size >> SECTOR_SHIFT) as u64;
                         range.start = align_down(range.start, block_size_sectors);
                         if start < range.start {
                             *sectors = (range.start - start) as u32;
@@ -431,52 +521,8 @@ fn end_request(rq: Owned<mq::Request<Self>>) {
     }
 }
 
-static_assert!((PAGE_SIZE >> SECTOR_SHIFT) <= 64);
-
-struct NullBlockPage {
-    page: Owned<SafePage>,
-    status: u64,
-}
-
-impl NullBlockPage {
-    fn new() -> Result<KBox<Self>> {
-        Ok(KBox::new(
-            Self {
-                page: SafePage::alloc_page(GFP_KERNEL | __GFP_ZERO)?,
-                status: 0,
-            },
-            GFP_KERNEL,
-        )?)
-    }
-
-    fn set_occupied(&mut self, sector: u64) {
-        let idx = sector & u64::from(PAGE_SECTOR_MASK);
-        self.status |= 1 << idx;
-    }
-
-    fn set_free(&mut self, sector: u64) {
-        let idx = sector & u64::from(PAGE_SECTOR_MASK);
-        self.status &= !(1 << idx);
-    }
-
-    fn is_empty(&self) -> bool {
-        self.status == 0
-    }
-}
-
-type TreeNode = KBox<NullBlockPage>;
-
-#[pin_data]
-struct QueueData {
-    #[pin]
-    tree: XArray<TreeNode>,
-    irq_mode: IRQMode,
-    completion_time: Delta,
-    memory_backed: bool,
-    block_size: u64,
-    bad_blocks: Arc<BadBlocks>,
-    bad_blocks_once: bool,
-    bad_blocks_partial_io: bool,
+struct HwQueueContext {
+    page: Option<KBox<disk_storage::NullBlockPage>>,
 }
 
 #[pin_data]
@@ -531,10 +577,10 @@ fn align_down<T>(value: T, to: T) -> T
 
 #[vtable]
 impl Operations for NullBlkDevice {
-    type QueueData = Pin<KBox<QueueData>>;
+    type QueueData = Pin<KBox<Self>>;
     type RequestData = Pdu;
     type TagSetData = ();
-    type HwData = ();
+    type HwData = Pin<KBox<SpinLock<HwQueueContext>>>;
 
     fn new_request_data() -> impl PinInit<Self::RequestData> {
         pin_init!(Pdu {
@@ -545,42 +591,40 @@ fn new_request_data() -> impl PinInit<Self::RequestData> {
 
     #[inline(always)]
     fn queue_rq(
-        _hw_data: (),
-        queue_data: Pin<&QueueData>,
+        hw_data: Pin<&SpinLock<HwQueueContext>>,
+        this: Pin<&Self>,
         mut rq: Owned<mq::Request<Self>>,
         _is_last: bool,
     ) -> Result {
         let mut sectors = rq.sectors();
 
-        Self::handle_bad_blocks(&mut rq, queue_data.get_ref(), &mut sectors)?;
+        Self::handle_bad_blocks(this.get_ref(), &mut rq, &mut sectors)?;
 
-        if queue_data.memory_backed {
+        if this.memory_backed {
             memalloc_scope!(let _noio: NoIo);
-            let tree = &queue_data.tree;
-
             if rq.command() == bindings::req_op_REQ_OP_DISCARD {
-                Self::discard(tree, rq.sector(), sectors.into(), queue_data.block_size)?;
+                this.discard(&hw_data, rq.sector(), sectors)?;
             } else {
-                Self::transfer(&mut rq, tree, sectors)?;
+                this.transfer(&hw_data, &mut rq, sectors)?;
             }
         }
 
-        match queue_data.irq_mode {
+        match this.irq_mode {
             IRQMode::None => Self::end_request(rq),
             IRQMode::Soft => mq::Request::complete(rq.into()),
             IRQMode::Timer => {
                 OwnableRefCounted::into_shared(rq)
-                    .start(queue_data.completion_time)
+                    .start(this.completion_time)
                     .dismiss();
             }
         }
         Ok(())
     }
 
-    fn commit_rqs(_hw_data: (), _queue_data: Pin<&QueueData>) {}
+    fn commit_rqs(_hw_data: Pin<&SpinLock<HwQueueContext>>, _queue_data: Pin<&Self>) {}
 
-    fn init_hctx(_tagset_data: (), _hctx_idx: u32) -> Result {
-        Ok(())
+    fn init_hctx(_tagset_data: (), _hctx_idx: u32) -> Result<Self::HwData> {
+        KBox::pin_init(new_spinlock!(HwQueueContext { page: None }), GFP_KERNEL)
     }
 
     fn complete(rq: ARef<mq::Request<Self>>) {

-- 
2.51.2



^ permalink raw reply related

* [PATCH v2 66/83] block: rust: add `TagSet::update_hw_queue_count`
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
  To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
	Boqun Feng, Lorenzo Stoakes
  Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
	rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>

Add a method to `TagSet` that allows changing the number of hardware queues
dynamically.

Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
 rust/kernel/block/mq/tag_set.rs | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/rust/kernel/block/mq/tag_set.rs b/rust/kernel/block/mq/tag_set.rs
index 858c1b952b00..e89c76987b54 100644
--- a/rust/kernel/block/mq/tag_set.rs
+++ b/rust/kernel/block/mq/tag_set.rs
@@ -170,6 +170,20 @@ pub fn hw_queue_count(&self) -> u32 {
         unsafe { (*self.inner.get()).nr_hw_queues }
     }
 
+    /// Update the number of hardware queues for this tag set.
+    ///
+    /// This operation may fail if memory for tags cannot be allocated.
+    pub fn update_hw_queue_count(&self, nr_hw_queues: u32) -> Result {
+        // SAFETY: blk_mq_update_nr_hw_queues applies internal synchronization.
+        unsafe { bindings::blk_mq_update_nr_hw_queues(self.inner.get(), nr_hw_queues) }
+
+        if self.hw_queue_count() == nr_hw_queues {
+            Ok(())
+        } else {
+            Err(ENOMEM)
+        }
+    }
+
     /// Borrow the [`T::TagSetData`] associated with this tag set.
     pub fn data(&self) -> <T::TagSetData as ForeignOwnable>::Borrowed<'_> {
         // SAFETY: By type invariant, `self.inner` is valid.

-- 
2.51.2



^ permalink raw reply related

* Re: [PATCH v2 01/83] block: rust: fix `Send` bound for `GenDisk`
From: Yuan Tan @ 2026-06-09 21:45 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
	Björn Roy Baron, Boqun Feng, Danilo Krummrich,
	FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
	John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
	Stephen Boyd, Thomas Gleixner, Trevor Gross, linux-block,
	linux-kernel, linux-mm, rust-for-linux, Priya Bala Govindasamy,
	Dylan Zueck, Yuan Tan
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-1-82c7404542e2@kernel.org>

On Tue, Jun 9, 2026 at 12:13 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> The `Send` implementation for `GenDisk<T>` was conditioned on `T: Send`.
> This constrains the wrong type. `T` is the `Operations` implementation,
> which is typically a zero-sized marker type that carries no data, so `T:
> Send` says nothing about whether the data a `GenDisk` actually owns can be
> moved to another thread.
>
> A `GenDisk<T>` owns the queue data `T::QueueData` (stored as the
> `gendisk`'s `queuedata` and dropped when the `GenDisk` is dropped) and an
> `Arc<TagSet<T>>`. These are the values transferred when a `GenDisk` is sent
> across a thread boundary, so the `Send` bound must constrain exactly them.
> Bound `T::QueueData: Send` and `Arc<TagSet<T>>: Send` instead.
>
> Fixes: 3253aba3408a ("rust: block: introduce `kernel::block::mq` module")
> Suggested-by: Yuan Tan <ytan089@ucr.edu>
> Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
> ---
>
> Please take patch from Yuan instead of this one, if they send a fixed
> version [1].
>
> [1] https://lore.kernel.org/r/8839ddc5ff54bf454d508cde91d27d00fc3e2dd8.1780633578.git.ytan089@ucr.edu

My last email mistakenly enabled html. So I am here to resend it. Hope
it doesn't disturb anyone.

Sorry, I've been busy with other things and haven't had the chance to
send the fixed version.

Thank you very much for reviewing the patch and for preparing the v2 version.

Could you please add the following when applying this patch?
Reported-by: Priya Bala Govindasamy <pgovind2@uci.edu>
Reported-by: Dylan Zueck <dzueck@uci.edu>

I didn't discover this issue myself. I just helped write the patch and
I don't want them to lose their credit for it.

Please let me know if you would prefer that I send a v3 instead.


> ---
>  rust/kernel/block/mq/gen_disk.rs | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/rust/kernel/block/mq/gen_disk.rs b/rust/kernel/block/mq/gen_disk.rs
> index 912cb805caf5..b36d24382cc3 100644
> --- a/rust/kernel/block/mq/gen_disk.rs
> +++ b/rust/kernel/block/mq/gen_disk.rs
> @@ -199,8 +199,14 @@ pub struct GenDisk<T: Operations> {
>  }
>
>  // SAFETY: `GenDisk` is an owned pointer to a `struct gendisk` and an `Arc` to a
> -// `TagSet` It is safe to send this to other threads as long as T is Send.
> -unsafe impl<T: Operations + Send> Send for GenDisk<T> {}
> +// `TagSet`. It is safe to send this to other threads as long as these two are `Send`.
> +unsafe impl<T> Send for GenDisk<T>
> +where
> +    T: Operations,
> +    T::QueueData: Send,
> +    Arc<TagSet<T>>: Send,
> +{
> +}
>
>  impl<T: Operations> Drop for GenDisk<T> {
>      fn drop(&mut self) {
>
> --
> 2.51.2
>
>

^ permalink raw reply

* Re: [PATCH v2] nvmet-rdma: handle inline data with a nonzero offset
From: Keith Busch @ 2026-06-09 22:00 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni, linux-nvme,
	linux-rdma, linux-block
In-Reply-To: <20260604193645.178350-1-hexlabsecurity@proton.me>

On Thu, Jun 04, 2026 at 07:36:54PM +0000, Bryam Vargas wrote:
> nvmet_rdma_use_inline_sg() maps the host-controlled inline data offset
> into the per-command inline scatterlist.  The bounds check admits any
> offset with off + len <= inline_data_size, but the mapping still assumes
> the data begins in the first inline page:

Thanks applied to nvme-7.2.

And not necessarily directed at you since apparently many people do
this, but it would help me a great deal if subsequent versions were
posted as a new thread rather than appending to the previous. The
interleaving of the intermediate just makes this harder to sift through.

I'm actually not even sure how so many people converged on this
anti-pattern, as 'git send-email' would have naturally created a new
thread for each new version. What exactly are people doing here?

^ permalink raw reply

* [PATCH 00/27] Enable lock context analysis in drivers/block/
From: Bart Van Assche @ 2026-06-09 22:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Marco Elver, Bart Van Assche

Hi Jens,

This patch series enables lock context analysis in all block drivers in the
drivers/block/ directory. Please consider these patches for the next merge
window.

Thanks,

Bart.

Bart Van Assche (27):
  aoe: Enable lock context analysis
  drbd: Remove "extern" from function declarations
  drbd: Retain one _get_ldev_if_state() implementation
  drbd: Remove the get_ldev_if_state() macro
  drbd: Remove the 'local' lock context
  drbd: Simplify the bitmap locking functions.
  drbd: Move two declarations
  drbd: Pass 'resource' directly to complete_conflicting_writes()
  drbd: Split drbd_nl_get_connections_dumpit()
  drbd: Make a mutex_unlock() call unconditional
  drbd: Split drbd_req_state()
  drbd: Convert drbd_req_state() to unconditional locking
  drbd: Annotate drbd_bm_{lock,unlock}()
  drbd: Enable lock context analysis
  loop: Split loop_change_fd()
  loop: Split loop_configure()
  loop: Remove the "bool global" function argument
  loop: Add lock context annotations
  mtip32: Enable lock context analysis
  nbd: Enable lock context analysis
  null_blk: Enable lock context analysis
  rbd: Enable lock context analysis
  ublk: Enable lock context analysis
  xen-blkback: Enable lock context analysis
  zram: Enable lock context analysis
  rnbd: Enable lock context analysis
  block: Enable lock context analysis for all block drivers

 drivers/block/Makefile                 |   2 +
 drivers/block/aoe/Makefile             |   2 +
 drivers/block/aoe/aoecmd.c             |   1 +
 drivers/block/drbd/Makefile            |   3 +
 drivers/block/drbd/drbd_bitmap.c       |  43 +-
 drivers/block/drbd/drbd_config.h       |   2 +-
 drivers/block/drbd/drbd_int.h          | 594 +++++++++++++------------
 drivers/block/drbd/drbd_interval.h     |  11 +-
 drivers/block/drbd/drbd_main.c         |  41 +-
 drivers/block/drbd/drbd_nl.c           | 117 ++---
 drivers/block/drbd/drbd_receiver.c     |  27 +-
 drivers/block/drbd/drbd_req.c          |  10 +-
 drivers/block/drbd/drbd_req.h          |  24 +-
 drivers/block/drbd/drbd_state.c        |  50 ++-
 drivers/block/drbd/drbd_state.h        |  41 +-
 drivers/block/drbd/drbd_state_change.h |  32 +-
 drivers/block/drbd/drbd_worker.c       |   6 +-
 drivers/block/loop.c                   | 251 ++++++-----
 drivers/block/mtip32xx/Makefile        |   2 +
 drivers/block/nbd.c                    |   3 +
 drivers/block/null_blk/Makefile        |   2 +
 drivers/block/null_blk/main.c          |  12 +-
 drivers/block/null_blk/zoned.c         |   2 +
 drivers/block/rbd.c                    |   8 +
 drivers/block/rnbd/Makefile            |   2 +
 drivers/block/ublk_drv.c               |   6 +
 drivers/block/xen-blkback/Makefile     |   3 +
 drivers/block/zram/Makefile            |   2 +
 drivers/block/zram/zcomp.c             |   3 +-
 drivers/block/zram/zcomp.h             |   6 +-
 30 files changed, 700 insertions(+), 608 deletions(-)


^ permalink raw reply

* [PATCH 01/27] aoe: Enable lock context analysis
From: Bart Van Assche @ 2026-06-09 22:04 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Marco Elver, Bart Van Assche,
	Christoph Hellwig, Justin Sanders
In-Reply-To: <cover.1781042470.git.bvanassche@acm.org>

Add a missing __must_hold() annotation. Enable lock context analysis in the
Makefile.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/block/aoe/Makefile | 2 ++
 drivers/block/aoe/aoecmd.c | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/block/aoe/Makefile b/drivers/block/aoe/Makefile
index b7545ce2f1b0..27bff6359a56 100644
--- a/drivers/block/aoe/Makefile
+++ b/drivers/block/aoe/Makefile
@@ -3,5 +3,7 @@
 # Makefile for ATA over Ethernet
 #
 
+CONTEXT_ANALYSIS := y
+
 obj-$(CONFIG_ATA_OVER_ETH)	+= aoe.o
 aoe-y := aoeblk.o aoechr.o aoecmd.o aoedev.o aoemain.o aoenet.o
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index a4744a30a8af..54c57b9f8894 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -1193,6 +1193,7 @@ noskb:		if (buf)
  */
 static int
 ktio(int id)
+	__must_hold(&iocq[id].lock)
 {
 	struct frame *f;
 	struct list_head *pos;

^ permalink raw reply related

* [PATCH 03/27] drbd: Retain one _get_ldev_if_state() implementation
From: Bart Van Assche @ 2026-06-09 22:04 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Marco Elver, Bart Van Assche,
	Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder,
	Nathan Chancellor
In-Reply-To: <cover.1781042470.git.bvanassche@acm.org>

There are two slightly different _get_ldev_if_state() implementations
in the DRBD source code. Keep the version that is used. This does not
affect C=1 / C=2 builds because lock context checking is performed by
Clang instead of sparse since commit 5b63d0ae94cc ("compiler-context-
analysis: Remove Sparse support").

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/block/drbd/drbd_int.h  |  7 ++-----
 drivers/block/drbd/drbd_main.c | 19 -------------------
 2 files changed, 2 insertions(+), 24 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index aa39c4d19133..bba52252fbac 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -2033,8 +2033,8 @@ static inline void put_ldev(struct drbd_device *device)
 	}
 }
 
-#ifndef __CHECKER__
-static inline int _get_ldev_if_state(struct drbd_device *device, enum drbd_disk_state mins)
+static inline int _get_ldev_if_state(struct drbd_device *device,
+				     enum drbd_disk_state mins)
 {
 	int io_allowed;
 
@@ -2048,9 +2048,6 @@ static inline int _get_ldev_if_state(struct drbd_device *device, enum drbd_disk_
 		put_ldev(device);
 	return io_allowed;
 }
-#else
-int _get_ldev_if_state(struct drbd_device *device, enum drbd_disk_state mins);
-#endif
 
 /* this throttles on-the-fly application requests
  * according to max_buffers settings;
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index a2a841c89201..dbf6e413db03 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -128,25 +128,6 @@ static const struct block_device_operations drbd_ops = {
 	.release	= drbd_release,
 };
 
-#ifdef __CHECKER__
-/* When checking with sparse, and this is an inline function, sparse will
-   give tons of false positives. When this is a real functions sparse works.
- */
-int _get_ldev_if_state(struct drbd_device *device, enum drbd_disk_state mins)
-{
-	int io_allowed;
-
-	atomic_inc(&device->local_cnt);
-	io_allowed = (device->state.disk >= mins);
-	if (!io_allowed) {
-		if (atomic_dec_and_test(&device->local_cnt))
-			wake_up(&device->misc_wait);
-	}
-	return io_allowed;
-}
-
-#endif
-
 /**
  * tl_release() - mark as BARRIER_ACKED all requests in the corresponding transfer log epoch
  * @connection:	DRBD connection.

^ permalink raw reply related

* [PATCH 04/27] drbd: Remove the get_ldev_if_state() macro
From: Bart Van Assche @ 2026-06-09 22:04 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Marco Elver, Bart Van Assche,
	Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
In-Reply-To: <cover.1781042470.git.bvanassche@acm.org>

The get_ldev_if_state() macro has been introduced because sparse does
not support something like __cond_acquires(). Remove this macro and
rename _get_ldev_if_state() into get_ldev_if_state().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/block/drbd/drbd_int.h | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index bba52252fbac..49383c309865 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -2002,9 +2002,6 @@ static inline bool is_sync_state(enum drbd_conns connection_state)
  *
  * You have to call put_ldev() when finished working with device->ldev.
  */
-#define get_ldev_if_state(_device, _min_state)				\
-	(_get_ldev_if_state((_device), (_min_state)) ?			\
-	 ({ __acquire(x); true; }) : false)
 #define get_ldev(_device) get_ldev_if_state(_device, D_INCONSISTENT)
 
 static inline void put_ldev(struct drbd_device *device)
@@ -2033,8 +2030,8 @@ static inline void put_ldev(struct drbd_device *device)
 	}
 }
 
-static inline int _get_ldev_if_state(struct drbd_device *device,
-				     enum drbd_disk_state mins)
+static inline int get_ldev_if_state(struct drbd_device *device,
+				    enum drbd_disk_state mins)
 {
 	int io_allowed;
 

^ permalink raw reply related

* [PATCH 06/27] drbd: Simplify the bitmap locking functions.
From: Bart Van Assche @ 2026-06-09 22:04 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Marco Elver, Bart Van Assche,
	Christoph Hellwig, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder
In-Reply-To: <cover.1781042470.git.bvanassche@acm.org>

Call mutex_lock() instead of mutex_trylock() followed by mutex_lock().
Remove the code that depends on device->bitmap == NULL because this
can't happen. All drbd_bm_{lock,unlock}() callers guarantee that the
device->bitmap pointer is valid.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/block/drbd/drbd_bitmap.c | 20 +-------------------
 1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index 1c68f1774e28..e332a96fe90c 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -124,22 +124,8 @@ static void __bm_print_lock_info(struct drbd_device *device, const char *func)
 void drbd_bm_lock(struct drbd_device *device, char *why, enum bm_flag flags)
 {
 	struct drbd_bitmap *b = device->bitmap;
-	int trylock_failed;
 
-	if (!b) {
-		drbd_err(device, "FIXME no bitmap in drbd_bm_lock!?\n");
-		return;
-	}
-
-	trylock_failed = !mutex_trylock(&b->bm_change);
-
-	if (trylock_failed) {
-		drbd_warn(device, "%s[%d] going to '%s' but bitmap already locked for '%s' by %s[%d]\n",
-			  current->comm, task_pid_nr(current),
-			  why, b->bm_why ?: "?",
-			  b->bm_task->comm, task_pid_nr(b->bm_task));
-		mutex_lock(&b->bm_change);
-	}
+	mutex_lock(&b->bm_change);
 	if (BM_LOCKED_MASK & b->bm_flags)
 		drbd_err(device, "FIXME bitmap already locked in bm_lock\n");
 	b->bm_flags |= flags & BM_LOCKED_MASK;
@@ -151,10 +137,6 @@ void drbd_bm_lock(struct drbd_device *device, char *why, enum bm_flag flags)
 void drbd_bm_unlock(struct drbd_device *device)
 {
 	struct drbd_bitmap *b = device->bitmap;
-	if (!b) {
-		drbd_err(device, "FIXME no bitmap in drbd_bm_unlock!?\n");
-		return;
-	}
 
 	if (!(BM_LOCKED_MASK & device->bitmap->bm_flags))
 		drbd_err(device, "FIXME bitmap not locked in bm_unlock\n");

^ permalink raw reply related

* [PATCH 07/27] drbd: Move two declarations
From: Bart Van Assche @ 2026-06-09 22:04 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Marco Elver, Bart Van Assche,
	Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
In-Reply-To: <cover.1781042470.git.bvanassche@acm.org>

Move the resources_mutex declaration in front of the functions that
acquire and release this mutex. Move the drbd_wait_misc() declaration
past the struct drbd_device definition. This patch prepares for adding
lock context annotations.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/block/drbd/drbd_int.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 4b9d69adb9d4..1f3f2157df8b 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -195,7 +195,7 @@ struct drbd_device_work {
 
 #include "drbd_interval.h"
 
-int drbd_wait_misc(struct drbd_device *, struct drbd_interval *);
+extern struct mutex resources_mutex;
 
 void lock_all_resources(void);
 void unlock_all_resources(void);
@@ -1094,6 +1094,7 @@ int drbd_bmio_set_n_write(struct drbd_device *device,
 			  struct drbd_peer_device *peer_device);
 int drbd_bmio_clear_n_write(struct drbd_device *device,
 			    struct drbd_peer_device *peer_device);
+int drbd_wait_misc(struct drbd_device *device, struct drbd_interval *i);
 
 /* Meta data layout
  *
@@ -1363,8 +1364,6 @@ extern struct bio_set drbd_md_io_bio_set;
 /* And a bio_set for cloning */
 extern struct bio_set drbd_io_bio_set;
 
-extern struct mutex resources_mutex;
-
 enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx,
 				      unsigned int minor);
 void drbd_destroy_device(struct kref *kref);

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox