* [PATCH v2 00/83] block: rnull: complete the rust null block driver
From: Andreas Hindborg @ 2026-06-09 19:07 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux, Yuan Tan, Andreas Hindborg
This series aims to bring the feature set of the Rust null block driver on
par with that of the C null_blk driver.
There are quite a few changes from v1 in this version. I tried to capture
everything in the change log, but I might have missed something along
the way.
I have prepared a tree with all dependencies applied at [1].
Best regards,
Andreas Hindborg
[1] git https://git.kernel.org/pub/scm/linux/kernel/git/a.hindborg/linux.git rnull-v7.1-rc2
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
Changes in v2:
- Fix shift direction in transfer length calculation.
- Retry page preload after reacquiring locks.
- Fix a bug where badblocks did not correctly limit IO size.
- Close TOCTOU window in configfs power-check stores (Alice).
- Use `bool` for semantically-boolean module parameters (Alice).
- Take `NumaNode` instead of a raw `i32` as the home node argument to `TagSet` (Alice).
- Use `c_void`, `c_uint`, and `c_int` from the prelude in `hctx` private data support (Alice).
- Use `size_of` from the prelude in `Request` private data support (Alice).
- Return `Ok(())` from `new_request_data` instead of `pin_init::zeroed` (Alice).
- Add `// CAST:` annotations to casts (Alice).
- Expand the comment on the `BLK_STS_.*` bindgen blocklist entry.
- Depend on "rust: module_param: return copy from value() for Copy types"
- block: rust: introduce `kernel::block::bio` module (Alice):
- Use `kernel::fmt::Display` for `Bio` and cache `raw_iter()`.
- Mark the `bio_advance_iter_single` helper `__rust_helper`.
- Use a `srctree/` link for the C header.
- Remove the stale reference-counting invariants from `Bio`.
- Take `Pin<&mut Self>` in `Bio::segment_iter` and `Request::bio_mut`.
- Document that the `bvec_iter` cursor can be copied and moved freely.
- Use `&raw mut` instead of `core::ptr::from_mut`.
- Narrow the `unsafe` block in `Request::command()` using `BitAnd` (Alice, Gary).
- Use `c_void` from the prelude and drop a spurious blank line in the `TagSet` flags module (Alice).
- Drop the `Tree` type alias in favor of `XArray<TreeNode>` in rnull (Alice).
- Use a `NoIo` memory allocation scope in `queue_rq` rather than passing `GFP_NOIO`.
- Add the missing comma between `memory_backed` and `submit_queues` in the configfs feature listing (Alice).
- Fix the `use_per_node_hctx` store to set `submit_queues` to the online node count instead of multiplying by it (Alice).
- Use `static_assert!` instead of a `build_assert!` constant for the page/sector width check (Alice).
- Fix a typo in the `TagSet::new` doc comment (Ken).
- block: rust: add `BadBlocks` for bad block tracking (Alice):
- Remove newline after `use` statements.
- Add C header link.
- Convert boolean to int with `into`.
- Remove duplicated docs from `enabled`.
- Use if/else rather than `then_some` in `set_bad`.
- Add a patch to rename `SECTOR_MASK` to `PAGE_SECTOR_MASK`.
- Use `pr_warn_once!` where applicable.
- Require `TagSet` private data to be `Send` for `TagSet` to be `Send`.
- Require `Operations::TagSetData: Sync`.
- Require `Operations::HwData: Send + Sync` and add a note on the bounds.
- Require `Operations::RequestData: Send` and add note on the bound.
- Add `TagSet::flags` to obtain flags and fix a bug in zoned emulation caused by taking a mutex under rcu read lock.
- Link to v1: https://msgid.link/20260216-rnull-v6-19-rc5-send-v1-0-de9a7af4b469@kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: rust-for-linux@vger.kernel.org
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: "Liam R. Howlett" <liam@infradead.org>
To: Alice Ryhl <aliceryhl@google.com>
To: Andreas Hindborg <a.hindborg@kernel.org>
To: Anna-Maria Behnsen <anna-maria@linutronix.de>
To: Benno Lossin <lossin@kernel.org>
To: Björn Roy Baron <bjorn3_gh@protonmail.com>
To: Boqun Feng <boqun.feng@gmail.com>
To: Boqun Feng <boqun@kernel.org>
To: Danilo Krummrich <dakr@kernel.org>
To: FUJITA Tomonori <fujita.tomonori@gmail.com>
To: Frederic Weisbecker <frederic@kernel.org>
To: Gary Guo <gary@garyguo.net>
To: Jens Axboe <axboe@kernel.dk>
To: John Stultz <jstultz@google.com>
To: Lorenzo Stoakes <ljs@kernel.org>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Lyude Paul <lyude@redhat.com>
To: Miguel Ojeda <ojeda@kernel.org>
To: Stephen Boyd <sboyd@kernel.org>
To: Thomas Gleixner <tglx@kernel.org>
To: Trevor Gross <tmgross@umich.edu>
---
Andreas Hindborg (83):
block: rust: fix `Send` bound for `GenDisk`
rust: block: rename `SECTOR_MASK` to `PAGE_SECTOR_MASK`
block: rnull: adopt new formatting guidelines
block: rnull: add module parameters
block: rnull: add macros to define configfs attributes
block: rust: fix generation of bindings to `BLK_STS_.*`
block: rust: change `queue_rq` request type to `Owned`
block: rust: add `Request` private data support
block: rust: document the lifetime of `Request`
block: rust: allow `hrtimer::Timer` in `RequestData`
block: rnull: add timer completion mode
block: rust: introduce `kernel::block::bio` module
block: rust: add `command` getter to `Request`
block: rust: mq: use GFP_KERNEL from prelude
block: rust: add `TagSet` flags
block: rnull: add memory backing
block: rnull: add submit queue count config option
block: rnull: add `use_per_node_hctx` config option
block: rust: allow specifying home node when constructing `TagSet`
block: rnull: allow specifying the home numa node
block: rust: add Request::sectors() method
block: rust: mq: add max_hw_discard_sectors support to GenDiskBuilder
block: rnull: add discard support
block: rust: add `NoDefaultScheduler` flag for `TagSet`
block: rnull: add no_sched module parameter and configfs attribute
block: rust: change sector type from usize to u64
block: rust: add `BadBlocks` for bad block tracking
block: rust: mq: add Request::end() method for custom status codes
block: rnull: add badblocks support
block: rnull: add badblocks_once support
block: rust: add `Segment::truncate`
block: rnull: add partial I/O support for bad blocks
block: rust: add `TagSet` private data support
block: rust: add `hctx` private data support
block: rnull: add volatile cache emulation
block: rust: implement `Sync` for `GenDisk`.
block: rust: add a back reference feature to `GenDisk`
block: rust: introduce an idle type state for `Request`
block: rust: add a request queue abstraction
block: rust: add a method to get the request queue for a request
block: rust: introduce `kernel::block::error`
block: rust: require `queue_rq` to return a `BlkResult`
block: rust: add `GenDisk::queue_data`
block: rnull: add bandwidth limiting
block: rnull: add blocking queue mode
block: rnull: add shared tags
block: rnull: add queue depth config option
block: rust: add an abstraction for `bindings::req_op`
block: rust: add a method to set the target sector of a request
block: rust: move gendisk vtable construction to separate function
block: rust: add zoned block device support
block: rust: add `TagSet::flags`
block: rnull: add zoned storage support
block: rust: add `map_queues` support
block: rust: add an abstraction for `struct blk_mq_queue_map`
block: rust: add polled completion support
block: rust: add accessors to `TagSet`
block: rnull: add polled completion support
block: rnull: add REQ_OP_FLUSH support
block: rust: add request flags abstraction
block: rust: add abstraction for block queue feature flags
block: rust: allow setting write cache and FUA flags for `GenDisk`
block: rust: add `Segment::copy_to_page_limit`
block: rnull: add fua support
block: rust: add `GenDisk::tag_set`
block: rust: add `TagSet::update_hw_queue_count`
block: rnull: add an option to change the number of hardware queues
block: rust: add an abstraction for `struct rq_list`
block: rust: add `queue_rqs` vtable hook
block: rnull: support queue_rqs
block: rust: remove the `is_poll` parameter from `queue_rq`
block: rust: add a debug assert for refcounts
block: rust: add `TagSet::tag_to_rq`
block: rust: add `Request::queue_index`
block: rust: add `Request::requeue`
block: rust: add `request_timeout` hook
block: rnull: add fault injection support
block: rust: add max_sectors option to `GenDiskBuilder`
block: rnull: allow configuration of the maximum IO size
block: rust: add `virt_boundary_mask` option to `GenDiskBuilder`
block: rnull: add `virt_boundary` option
block: rnull: add `shared_tag_bitmap` config option
block: rnull: add zone offline and readonly configfs files
drivers/block/rnull/Kconfig | 11 +
drivers/block/rnull/configfs.rs | 605 +++++++++++++--
drivers/block/rnull/configfs/macros.rs | 143 ++++
drivers/block/rnull/disk_storage.rs | 326 ++++++++
drivers/block/rnull/disk_storage/page.rs | 78 ++
drivers/block/rnull/rnull.rs | 1198 ++++++++++++++++++++++++++++--
drivers/block/rnull/util.rs | 65 ++
drivers/block/rnull/zoned.rs | 696 +++++++++++++++++
rust/bindgen_parameters | 6 +
rust/bindings/bindings_helper.h | 55 ++
rust/helpers/blk.c | 47 ++
rust/kernel/block.rs | 101 ++-
rust/kernel/block/badblocks.rs | 716 ++++++++++++++++++
rust/kernel/block/bio.rs | 147 ++++
rust/kernel/block/bio/vec.rs | 448 +++++++++++
rust/kernel/block/mq.rs | 78 +-
rust/kernel/block/mq/feature.rs | 76 ++
rust/kernel/block/mq/gen_disk.rs | 336 +++++++--
rust/kernel/block/mq/operations.rs | 489 +++++++++++-
rust/kernel/block/mq/request.rs | 677 ++++++++++++++---
rust/kernel/block/mq/request/command.rs | 65 ++
rust/kernel/block/mq/request/flag.rs | 65 ++
rust/kernel/block/mq/request_list.rs | 119 +++
rust/kernel/block/mq/request_queue.rs | 60 ++
rust/kernel/block/mq/tag_set.rs | 299 +++++++-
rust/kernel/block/mq/tag_set/flags.rs | 29 +
rust/kernel/error.rs | 3 +-
rust/kernel/page.rs | 2 +-
rust/kernel/time/hrtimer.rs | 5 +-
29 files changed, 6603 insertions(+), 342 deletions(-)
---
base-commit: 9e0898f1c0f134c6bad146ca8578f73c3e40ac0a
change-id: 20260215-rnull-v6-19-rc5-send-98c33ec692d6
prerequisite-change-id: 20250305-unique-ref-29fcd675f9e9:v17
prerequisite-patch-id: 6c6a7fdd56627293ec3bba61c495f16a0858700c
prerequisite-patch-id: c1958590235ee32d6ddb31ea168105bd9cf248f2
prerequisite-patch-id: c5a4b231dc8adf37e93ebdce308dacbe6a244bf3
prerequisite-patch-id: 541dba7938ba874f8d17fee05a36b1cd9fa2c4d7
prerequisite-patch-id: 3668fd640e4c411bae0c8ea9d986c3fa5d3c9e82
prerequisite-patch-id: da1274864841e267697be9529a50531126c64872
prerequisite-patch-id: c1463b6578e94b56d2bad41f6e614b5286fb1db3
prerequisite-patch-id: a31185fe1abbf553377d6d695c5d206eebc84358
prerequisite-patch-id: 4f392b5736e55a354ec3022644389f89b52fda42
prerequisite-patch-id: b6388ff0ebdd54610010d72a5398842a3c668bbf
prerequisite-change-id: 20251203-xarray-entry-send-00230f0744e6:v4
prerequisite-patch-id: 5d797523ed1bb94597570b6faa4cacea8d94b4f7
prerequisite-patch-id: f82bffce83d85ad4dd0bc9dab876e31c4500d467
prerequisite-patch-id: bc00e3c0a3694d8d490c782bc24b2a5786350da7
prerequisite-patch-id: 39c26c865ad383b133a742e5998e2b1f54999908
prerequisite-patch-id: 4082a1ae45104c2f3170197e186d83db552f9302
prerequisite-patch-id: de0c55224727e169d151d68a5316f0ae4549e4b8
prerequisite-patch-id: 57c6d2464a380542b5283817666540d2c97b0b61
prerequisite-patch-id: c788013f9319aa91f51f74f92f43cf7f2c04496f
prerequisite-patch-id: 959c962400d8595cc55b4f1b3a5501c2290a7d0e
prerequisite-patch-id: 66ed5c6a31fe2d775b5bc70774e3148fa3d860e5
prerequisite-patch-id: 869aa913843e11b467890ed35a1455458dbf3de4
prerequisite-change-id: 20260206-xarray-lockdep-fix-10f1cc68e5d7:v2
prerequisite-patch-id: e871db17a721fede1b7419b8236229190449885b
prerequisite-change-id: 20260130-page-volatile-io-05ff595507d3:v4
prerequisite-patch-id: 09224764d69c35c18e6fec846d4b7ba33c0e9cac
prerequisite-patch-id: cfd909257db3f5811c94d52ac2fc31cf220560c3
prerequisite-change-id: 20260128-gfp-noio-fbd41e135088:v2
prerequisite-patch-id: 420a09fdd0f2758f4d46228f99f29ff82f2d05f3
prerequisite-change-id: 20260212-impl-flags-inner-c61974b27b18:v2
prerequisite-patch-id: 379fb78c07b554278fae3c42d84d62bcfcfa0d45
prerequisite-change-id: 20260214-pin-slice-init-e8ef96fc07b9:v2
prerequisite-patch-id: cdf4e4b2b8c43bcb54b3ddf13a02e28c0e11e9ce
prerequisite-change-id: 20260215-page-additions-bc36046e9ffd:v2
prerequisite-patch-id: 6c6a7fdd56627293ec3bba61c495f16a0858700c
prerequisite-patch-id: c1958590235ee32d6ddb31ea168105bd9cf248f2
prerequisite-patch-id: c5a4b231dc8adf37e93ebdce308dacbe6a244bf3
prerequisite-patch-id: 541dba7938ba874f8d17fee05a36b1cd9fa2c4d7
prerequisite-patch-id: 3668fd640e4c411bae0c8ea9d986c3fa5d3c9e82
prerequisite-patch-id: da1274864841e267697be9529a50531126c64872
prerequisite-patch-id: c1463b6578e94b56d2bad41f6e614b5286fb1db3
prerequisite-patch-id: a31185fe1abbf553377d6d695c5d206eebc84358
prerequisite-patch-id: 4f392b5736e55a354ec3022644389f89b52fda42
prerequisite-patch-id: b6388ff0ebdd54610010d72a5398842a3c668bbf
prerequisite-patch-id: 1f57b529c53f4a650cbeeb7c1ff81653cb95e7f3
prerequisite-patch-id: 4d71a95c2d1a6a36339a9feda6296c33ec86f258
prerequisite-change-id: 20260215-cpu-helpers-08efb2572487:v2
prerequisite-patch-id: fd7f24bed247075d1946f9f526390772afb45236
prerequisite-patch-id: 7d243f4cd29a08a1eb2ca0e0e976fa82f0760f11
prerequisite-change-id: 20260215-export-do-unlocked-00a6ac9373d4:v2
prerequisite-patch-id: c65f4a3078f1acc1b77ea28b531e54664187dbce
prerequisite-change-id: 20260215-impl-flags-additions-0340ffcba5b9:v2
prerequisite-patch-id: 379fb78c07b554278fae3c42d84d62bcfcfa0d45
prerequisite-patch-id: 04c7db66a06be7a2566a23328d2c485ce24f1bb8
prerequisite-patch-id: 4d78d6d7aae15c51e6a1df2cb393392fb7ea90de
prerequisite-change-id: 20260215-ringbuffer-42455964aaf2:v2
prerequisite-patch-id: 44924a030c52ae111983078f1225510e9dc0c009
prerequisite-change-id: 20260215-configfs-c-default-groups-bdb0a44633a6:v2
prerequisite-patch-id: 03b8e71b79be89a73946f3c1f7248671c28ccd42
prerequisite-change-id: 20260215-unique-arc-as-ptr-32eb209dde1b:v2
prerequisite-patch-id: 20f44fe6cfe6b0e52b614bd64469fbf1df5c1e94
prerequisite-change-id: 20260215-rust-fault-inject-bc62f1083502:v2
prerequisite-patch-id: 03b8e71b79be89a73946f3c1f7248671c28ccd42
prerequisite-patch-id: 8b287be6364945d10e661e0828ad17b023f487e1
prerequisite-change-id: 20260215-hrtimer-active-f183411fe56b:v2
prerequisite-patch-id: e029dd2cb097192e597417e40d7d23bedaa79370
prerequisite-change-id: 20260529-modules-value-ref-e95a7ab94fdb:v2
prerequisite-patch-id: 618f9f3cfea3f8a03db5e73229d77b48f6549ab4
prerequisite-message-id: <20260411130254.3510128-2-wenzhaoliao@ruc.edu.cn>
prerequisite-patch-id: f714b166f93e453dddd01ed17c976b53e6da4957
prerequisite-change-id: 20260608-queue-data-sync-80b66ab312ac:v1
prerequisite-patch-id: ec86c4ec1531441a2c19085bf24ecc06819d7420
prerequisite-change-id: 20260608-update-hw-nodes-arg-940ecec0380a:v1
prerequisite-patch-id: a1e95b0ec36bf18976553fb8a2e17fd1527a6a1a
prerequisite-change-id: 20260608-configfs-fix-offset-6b3117158901:v1
prerequisite-patch-id: e8355bdd4444f8bda2663aa0bdcf3336de126255
prerequisite-change-id: 20260608-numa-node-id-85de708d4e8d:v1
prerequisite-patch-id: 8b82a179a91cd3e0ca8396eff81dae7bf66e5349
Best regards,
--
Andreas Hindborg <a.hindborg@kernel.org>
^ permalink raw reply
* [PATCH v2 49/83] block: rust: add a method to set the target sector of a request
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add a `block::mq::Request::set_sector` to allow setting the target sector
of a request.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/kernel/block/mq/request.rs | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request.rs
index 63e248970ab1..66ef2493c448 100644
--- a/rust/kernel/block/mq/request.rs
+++ b/rust/kernel/block/mq/request.rs
@@ -336,6 +336,13 @@ pub(crate) fn wrapper_ref(&self) -> &RequestDataWrapper<T> {
pub fn data_ref(&self) -> &T::RequestData {
&self.wrapper_ref().data
}
+
+ /// Set the target sector for the request.
+ #[inline(always)]
+ pub fn set_sector(self: Pin<&mut Self>, sector: u64) {
+ // SAFETY: By type invariant of `Self`, `self.0` is valid and live.
+ unsafe { (*self.0 .0.get()).__sector = sector }
+ }
}
/// A wrapper around data stored in the private area of the C [`struct request`].
--
2.51.2
^ permalink raw reply related
* [PATCH v2 27/83] block: rust: add `BadBlocks` for bad block tracking
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add a safe Rust wrapper around the Linux kernel's badblocks infrastructure
to track and manage defective sectors on block devices. The BadBlocks type
provides methods to:
- Mark sectors as bad or good (set_bad/set_good)
- Check if sector ranges contain bad blocks (check)
- Automatically handle memory management with PinnedDrop
The implementation includes comprehensive documentation with examples for
block device drivers that need to avoid known bad sectors to maintain
data integrity. Bad blocks information is used by device drivers,
filesystem layers, and device management tools.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/bindings/bindings_helper.h | 1 +
rust/kernel/block.rs | 1 +
rust/kernel/block/badblocks.rs | 716 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 718 insertions(+)
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index b1fb3afee4ca..eaf05d60dda9 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -38,6 +38,7 @@
#include <drm/drm_ioctl.h>
#include <kunit/test.h>
#include <linux/auxiliary_bus.h>
+#include <linux/badblocks.h>
#include <linux/bitmap.h>
#include <linux/blk-mq.h>
#include <linux/blk_types.h>
diff --git a/rust/kernel/block.rs b/rust/kernel/block.rs
index eb512dad031b..96e48a2e6116 100644
--- a/rust/kernel/block.rs
+++ b/rust/kernel/block.rs
@@ -2,6 +2,7 @@
//! Types for working with the block layer.
+pub mod badblocks;
pub mod bio;
pub mod mq;
diff --git a/rust/kernel/block/badblocks.rs b/rust/kernel/block/badblocks.rs
new file mode 100644
index 000000000000..0aab661ed7be
--- /dev/null
+++ b/rust/kernel/block/badblocks.rs
@@ -0,0 +1,716 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Bad blocks tracking for block devices.
+//!
+//! This module provides a safe Rust wrapper around the badblocks
+//! infrastructure, which is used to track and manage bad sectors on block
+//! devices. Bad blocks are sectors that cannot reliably store data and should
+//! be avoided during I/O operations.
+//!
+//! C header: [`include/linux/fault-inject.h`](srctree/include/linux/fault-inject.h).
+
+use core::ops::{
+ Range,
+ RangeBounds, //
+};
+
+use crate::{
+ error::to_result,
+ page::PAGE_SIZE,
+ prelude::*,
+ sync::atomic::{
+ ordering,
+ Atomic, //
+ },
+ types::Opaque,
+};
+use pin_init::{
+ pin_data,
+ PinInit, //
+};
+
+/// A bad blocks tracker for managing defective sectors on a block device.
+///
+/// `BadBlocks` provides functionality to mark sectors as bad and check if
+/// ranges contain bad blocks. This is useful for some classes of drivers to
+/// maintain data integrity by avoiding known bad sectors.
+///
+/// # Storage Format
+///
+/// Bad blocks are stored in a compact format where each 64-bit entry contains:
+/// - **Sector offset** (54 bits): Starting sector of the bad range
+/// - **Length** (9 bits): Number of sectors (1-512) in the bad range
+/// - **Acknowledged flag** (1 bit): Whether the bad blocks have been acknowledged
+///
+/// The bad blocks tracker uses exactly one page ([`PAGE_SIZE`]) of memory to store
+/// bad block entries. This allows tracking up to `PAGE_SIZE/8` bad block ranges
+/// (typically 512 ranges on systems with 4KB pages).
+///
+/// # Locking
+///
+/// Operations on the structure is internally synchronized by a seqlock.
+///
+/// # Examples
+///
+/// Basic usage:
+///
+/// ```rust
+/// # use kernel::block::badblocks::{BadBlocks, BlockStatus};
+/// # use kernel::prelude::*;
+/// // Create a new bad blocks tracker
+/// let bad_blocks = KBox::pin_init(BadBlocks::new(true), GFP_KERNEL)?;
+///
+/// // Mark sectors 100-109 as bad (unacknowledged)
+/// bad_blocks.set_bad(100..110, false)?;
+///
+/// // Check if sector range 95-104 contains bad blocks
+/// match bad_blocks.check(95..105) {
+/// BlockStatus::None => pr_info!("No bad blocks found"),
+/// BlockStatus::Acknowledged(range) => pr_warn!("Acknowledged bad blocks: {:?}", range),
+/// BlockStatus::Unacknowledged(range) => pr_err!("Unacknowledged bad blocks: {:?}", range),
+/// }
+/// # Ok::<(), kernel::error::Error>(())
+/// ```
+/// # Invariants
+///
+/// - `self.blocks` is a valid `bindings::badblocks` struct.
+#[pin_data(PinnedDrop)]
+pub struct BadBlocks {
+ #[pin]
+ blocks: Opaque<bindings::badblocks>,
+}
+
+impl BadBlocks {
+ /// Creates a new bad blocks tracker.
+ ///
+ /// Initializes an empty bad blocks tracker that can manage defective sectors
+ /// on a block device. The tracker starts with no bad blocks recorded and
+ /// allocates a single page for storing bad block entries.
+ ///
+ /// # Returns
+ ///
+ /// Returns a [`PinInit`] that can be used to initialize a [`BadBlocks`] instance.
+ /// Initialization may fail with `ENOMEM` if memory allocation fails.
+ ///
+ /// # Examples
+ ///
+ /// ```rust
+ /// # use kernel::block::badblocks::{BadBlocks, BlockStatus};
+ /// # use kernel::prelude::*;
+ /// // Create and initialize a bad blocks tracker
+ /// let bad_blocks = KBox::pin_init(BadBlocks::new(true), GFP_KERNEL)?;
+ ///
+ /// // The tracker is ready to use with no bad blocks initially
+ /// match bad_blocks.check(0..100) {
+ /// BlockStatus::None => pr_info!("No bad blocks found initially"),
+ /// _ => unreachable!(),
+ /// }
+ /// # Ok::<(), kernel::error::Error>(())
+ /// ```
+ pub fn new(enable: bool) -> impl PinInit<Self, Error> {
+ // INVARIANT: We initialize `self.blocks` below. If initialization fails, an error is
+ // returned.
+ try_pin_init!(Self {
+ blocks <- Opaque::try_ffi_init(|slot| {
+ // SAFETY: `slot` is a valid pointer to uninitialized memory
+ // allocated by the Opaque type. `badblocks_init` is safe to
+ // call with uninitialized memory.
+ to_result(unsafe { bindings::badblocks_init(slot, enable.into()) })
+ }),
+ })
+ }
+
+ fn shift_ref(&self) -> &Atomic<c_int> {
+ // SAFETY: By type invariant self.blocks is valid.
+ let ptr = unsafe { &raw const (*self.blocks.get()).shift };
+ // SAFETY: `shift` is only written by C code using atomic operations after initialization.
+ unsafe { Atomic::from_ptr(ptr.cast_mut().cast()) }
+ }
+
+ /// Enables the bad blocks tracker if it was previously disabled.
+ ///
+ /// Attempts to enable bad block tracking by transitioning the tracker from
+ /// a disabled state to an enabled state.
+ ///
+ /// # Behavior
+ ///
+ /// - If the tracker is disabled, it will be enabled.
+ /// - If the tracker is already enabled, this operation has no effect.
+ /// - The operation is atomic and thread-safe.
+ ///
+ /// # Usage
+ ///
+ /// Bad blocks trackers can be created in a disabled state and enabled later
+ /// when needed. This is useful for conditional bad block tracking or for
+ /// deferring activation until the device is fully initialized.
+ ///
+ /// # Examples
+ ///
+ /// ```rust
+ /// # use kernel::block::badblocks::BadBlocks;
+ /// # use kernel::prelude::*;
+ /// // Create a disabled bad blocks tracker
+ /// let bad_blocks = KBox::pin_init(BadBlocks::new(false), GFP_KERNEL)?;
+ /// assert!(!bad_blocks.enabled());
+ ///
+ /// // Enable it when needed
+ /// bad_blocks.enable();
+ /// assert!(bad_blocks.enabled());
+ ///
+ /// // Subsequent enable calls have no effect
+ /// bad_blocks.enable();
+ /// assert!(bad_blocks.enabled());
+ /// # Ok::<(), kernel::error::Error>(())
+ /// ```
+ pub fn enable(&self) {
+ let _ = self.shift_ref().cmpxchg(-1, 0, ordering::Relaxed);
+ }
+
+ /// Checks whether the bad blocks tracker is currently enabled.
+ ///
+ /// Returns `true` if bad block tracking is active, `false` if it is disabled.
+ /// When disabled, the tracker will not perform bad block checks or operations.
+ ///
+ /// # Thread Safety
+ ///
+ /// This method is thread-safe and uses atomic operations to check the
+ /// tracker's state without requiring external synchronization.
+ ///
+ /// # Examples
+ ///
+ /// ```rust
+ /// # use kernel::block::badblocks::BadBlocks;
+ /// # use kernel::prelude::*;
+ /// // Create an enabled tracker
+ /// let enabled_tracker = KBox::pin_init(BadBlocks::new(true), GFP_KERNEL)?;
+ /// assert!(enabled_tracker.enabled());
+ ///
+ /// // Create a disabled tracker
+ /// let disabled_tracker = KBox::pin_init(BadBlocks::new(false), GFP_KERNEL)?;
+ /// assert!(!disabled_tracker.enabled());
+ ///
+ /// // Enable and verify
+ /// disabled_tracker.enable();
+ /// assert!(disabled_tracker.enabled());
+ /// # Ok::<(), kernel::error::Error>(())
+ /// ```
+ pub fn enabled(&self) -> bool {
+ self.shift_ref().load(ordering::Relaxed) >= 0
+ }
+
+ /// Marks a range of sectors as bad.
+ ///
+ /// Records a contiguous range of sectors as defective in the bad blocks tracker.
+ /// Bad sectors should be avoided during I/O operations to prevent data corruption.
+ /// The implementation may merge, split, or extend existing ranges as needed.
+ ///
+ /// # Parameters
+ ///
+ /// - `range` - The range of sectors to mark as bad. Each individual range is limited to 512
+ /// sectors maximum by the underlying implementation.
+ /// - `acknowledged` - Whether the bad blocks have been acknowledged to be bad. Acknowledged bad
+ /// blocks may be handled differently by some subsystems.
+ ///
+ /// # Acknowledgment Semantics
+ ///
+ /// - **Unacknowledged** (`acknowledged = false`): Newly discovered bad blocks that
+ /// need attention. These are often treated as errors by upper layers.
+ /// - **Acknowledged** (`acknowledged = true`): Blocks that have been confirmed bad. These may
+ /// be should be handled by remapping.
+ ///
+ /// # Range Management
+ ///
+ /// The implementation automatically:
+ /// - **Merges** adjacent or overlapping ranges with the same acknowledgment status
+ /// - **Splits** ranges when acknowledgment status differs
+ /// - **Extends** existing ranges when new bad blocks are adjacent
+ /// - **Limits** individual ranges to 512 sectors maximum (BB_MAX_LEN)
+ ///
+ /// Please see [C documentation] for details.
+ ///
+ /// # Performance
+ ///
+ /// Executes in O(n) time where n is number of entries in the bad block table.
+ ///
+ /// # Returns
+ ///
+ /// * `Ok(())` - Bad blocks were successfully recorded
+ /// * `Err(ENOMEM)` - Insufficient space in bad blocks table (table full)
+ ///
+ /// # Examples
+ ///
+ /// Basic usage:
+ ///
+ /// ```rust
+ /// # use kernel::block::badblocks::{BadBlocks, BlockStatus};
+ /// # use kernel::prelude::*;
+ /// let bad_blocks = KBox::pin_init(BadBlocks::new(true), GFP_KERNEL)?;
+ ///
+ /// // Mark sectors 1000-1009 as bad (unacknowledged)
+ /// bad_blocks.set_bad(1000..1010, false)?;
+ ///
+ /// // Mark a single sector as bad and acknowledged
+ /// bad_blocks.set_bad(2000..2001, true)?;
+ ///
+ /// // Verify the bad blocks are recorded
+ /// assert!(matches!(bad_blocks.check(1000..1010), BlockStatus::Unacknowledged(_)));
+ /// assert!(matches!(bad_blocks.check(2000..2001), BlockStatus::Acknowledged(_)));
+ /// # Ok::<(), kernel::error::Error>(())
+ /// ```
+ ///
+ /// Range merging behavior:
+ ///
+ /// ```rust
+ /// # use kernel::block::badblocks::{BadBlocks, BlockStatus};
+ /// # use kernel::prelude::*;
+ /// let bad_blocks = KBox::pin_init(BadBlocks::new(true), GFP_KERNEL)?;
+ ///
+ /// // Add adjacent ranges with same acknowledgment status
+ /// bad_blocks.set_bad(100..105, false)?; // Sectors 100-104
+ /// bad_blocks.set_bad(105..108, false)?; // Sectors 105-107
+ ///
+ /// // These will be merged into a single range 100-107
+ /// match bad_blocks.check(100..108) {
+ /// BlockStatus::Unacknowledged(range) => {
+ /// assert_eq!(range.start, 100);
+ /// assert_eq!(range.end, 108);
+ /// },
+ /// _ => panic!("Expected unacknowledged bad blocks"),
+ /// }
+ /// # Ok::<(), kernel::error::Error>(())
+ /// ```
+ ///
+ /// Handling acknowledgment conflicts:
+ ///
+ /// ```rust
+ /// # use kernel::block::badblocks::{BadBlocks, BlockStatus};
+ /// # use kernel::prelude::*;
+ /// let bad_blocks = KBox::pin_init(BadBlocks::new(true), GFP_KERNEL)?;
+ ///
+ /// // Mark range as unacknowledged
+ /// bad_blocks.set_bad(200..210, false)?;
+ ///
+ /// // Acknowledge part of the range (will split)
+ /// bad_blocks.set_bad(205..208, true)?;
+ ///
+ /// // Now we have: unack[200-204], ack[205-207], unack[208-209]
+ /// assert!(matches!(bad_blocks.check(200..205), BlockStatus::Unacknowledged(_)));
+ /// assert!(matches!(bad_blocks.check(205..208), BlockStatus::Acknowledged(_)));
+ /// assert!(matches!(bad_blocks.check(208..210), BlockStatus::Unacknowledged(_)));
+ /// # Ok::<(), kernel::error::Error>(())
+ /// ```
+ ///
+ /// [C documentation]: srctree/block/badblocks.c
+ pub fn set_bad(&self, range: impl RangeBounds<u64>, acknowledged: bool) -> Result {
+ let range = Self::range(range);
+
+ // SAFETY: By type invariant `self.blocks` is valid. The C function
+ // `badblocks_set` handles synchronization internally.
+ let status = unsafe {
+ bindings::badblocks_set(
+ self.blocks.get(),
+ range.start,
+ range.end - range.start,
+ if acknowledged { 1 } else { 0 },
+ )
+ };
+
+ if status {
+ Ok(())
+ } else {
+ Err(ENOMEM)
+ }
+ }
+
+ /// Marks a range of sectors as good.
+ ///
+ /// Removes a contiguous range of sectors from the bad blocks tracker,
+ /// indicating that these sectors are now reliable for I/O operations.
+ /// This is typically used after bad sectors have been repaired, remapped,
+ /// or determined to be false positives.
+ ///
+ /// # Parameters
+ ///
+ /// - `range` - The range of sectors to mark as good.
+ ///
+ /// # Behavior
+ ///
+ /// The implementation handles various scenarios automatically:
+ /// - **Complete removal**: If the range exactly matches a bad block range, it's removed
+ /// entirely.
+ /// - **Partial removal**: If the range partially overlaps, the bad block range is split or
+ /// trimmed.
+ /// - **No effect**: If the range doesn't overlap any bad blocks, the operation succeeds without
+ /// changes.
+ /// - **Range splitting**: If the cleared range is in the middle of a bad block range, it may
+ /// split the range in two.
+ ///
+ /// # Performance
+ ///
+ /// Executes in O(n) time where n is the number of entries in the bad blocks table.
+ ///
+ /// # Returns
+ ///
+ /// * `Ok(())` - Sectors were successfully marked as good (or were already good)
+ /// * `Err(EINVAL)` - Operation failed (typically due to table constraints)
+ ///
+ /// # Examples
+ ///
+ /// Basic usage after repair:
+ ///
+ /// ```rust
+ /// # use kernel::block::badblocks::{BadBlocks, BlockStatus};
+ /// # use kernel::prelude::*;
+ /// let bad_blocks = KBox::pin_init(BadBlocks::new(true), GFP_KERNEL)?;
+ ///
+ /// // Mark some sectors as bad initially
+ /// bad_blocks.set_bad(100..110, false)?;
+ /// assert!(matches!(bad_blocks.check(100..110), BlockStatus::Unacknowledged(_)));
+ ///
+ /// // After successful repair, mark them as good
+ /// bad_blocks.set_good(100..110)?;
+ /// assert!(matches!(bad_blocks.check(100..110), BlockStatus::None));
+ /// # Ok::<(), kernel::error::Error>(())
+ /// ```
+ ///
+ /// Partial clearing:
+ ///
+ /// ```rust
+ /// # use kernel::block::badblocks::{BadBlocks, BlockStatus};
+ /// # use kernel::prelude::*;
+ /// let bad_blocks = KBox::pin_init(BadBlocks::new(true), GFP_KERNEL)?;
+ ///
+ /// // Mark a large range as bad
+ /// bad_blocks.set_bad(200..220, false)?;
+ ///
+ /// // Clear only the middle portion
+ /// bad_blocks.set_good(205..215)?; // Clear sectors 205-214
+ ///
+ /// // Now we have bad blocks at the edges: 200-204 and 215-219
+ /// assert!(matches!(bad_blocks.check(200..205), BlockStatus::Unacknowledged(_)));
+ /// assert!(matches!(bad_blocks.check(205..215), BlockStatus::None));
+ /// assert!(matches!(bad_blocks.check(215..220), BlockStatus::Unacknowledged(_)));
+ /// # Ok::<(), kernel::error::Error>(())
+ /// ```
+ ///
+ /// Safe clearing of potentially good sectors:
+ ///
+ /// ```rust
+ /// # use kernel::block::badblocks::{BadBlocks, BlockStatus};
+ /// # use kernel::prelude::*;
+ /// let bad_blocks = KBox::pin_init(BadBlocks::new(true), GFP_KERNEL)?;
+ ///
+ /// // It's safe to clear sectors that were never marked as bad
+ /// bad_blocks.set_good(1000..1100)?; // No-op, but succeeds
+ /// assert!(matches!(bad_blocks.check(1000..1100), BlockStatus::None));
+ /// # Ok::<(), kernel::error::Error>(())
+ /// ```
+ pub fn set_good(&self, range: impl RangeBounds<u64>) -> Result {
+ let range = Self::range(range);
+ // SAFETY: By type invariant `self.blocks` is valid. The C function
+ // `badblocks_clear` handles synchronization internally.
+ unsafe {
+ bindings::badblocks_clear(self.blocks.get(), range.start, range.end - range.start)
+ }
+ .then_some(())
+ .ok_or(EINVAL)
+ }
+
+ // Transform a `RangeBounds` to start included end excluded range.
+ fn range(range: impl RangeBounds<u64>) -> Range<u64> {
+ let start = match range.start_bound() {
+ core::ops::Bound::Included(start) => *start,
+ core::ops::Bound::Excluded(start) => start + 1,
+ core::ops::Bound::Unbounded => u64::MIN,
+ };
+
+ let end = match range.end_bound() {
+ core::ops::Bound::Included(end) => end + 1,
+ core::ops::Bound::Excluded(end) => *end,
+ core::ops::Bound::Unbounded => u64::MAX,
+ };
+
+ start..end
+ }
+
+ /// Checks if a range of sectors contains any bad blocks.
+ ///
+ /// Examines the specified sector range to determine if it contains any sectors
+ /// that have been marked as bad. This is typically called before performing I/O
+ /// operations to avoid accessing defective sectors. The check uses seqlocks to
+ /// ensure consistent reads even under concurrent modifications.
+ ///
+ /// # Parameters
+ ///
+ /// - `range` - The range of sectors to check (supports any type implementing
+ /// `RangeBounds<u64>`).
+ ///
+ /// # Returns
+ ///
+ /// Returns a [`BlockStatus`] indicating the state of the checked range:
+ ///
+ /// - `BlockStatus::None` - No bad blocks found in the specified range.
+ /// - `BlockStatus::Acknowledged(range)` - Contains acknowledged bad blocks.
+ /// - `BlockStatus::Unacknowledged(range)` - Contains unacknowledged bad blocks.
+ ///
+ /// The returned range indicates the **first bad block range** encountered that
+ /// overlaps with the checked area. If multiple separate bad ranges exist, only
+ /// the first is reported.
+ ///
+ /// # Performance
+ ///
+ /// The check operation uses binary search on the sorted bad blocks table,
+ /// providing O(log n) lookup time where n is the number of bad block ranges.
+ ///
+ /// # Examples
+ ///
+ /// Basic checking:
+ ///
+ /// ```rust
+ /// # use kernel::block::badblocks::{BadBlocks, BlockStatus};
+ /// # use kernel::prelude::*;
+ /// let bad_blocks = KBox::pin_init(BadBlocks::new(true), GFP_KERNEL)?;
+ ///
+ /// // Initially no bad blocks
+ /// assert!(matches!(bad_blocks.check(0..1000), BlockStatus::None));
+ ///
+ /// // Mark some sectors as bad
+ /// bad_blocks.set_bad(100..110, false)?;
+ ///
+ /// // Check various ranges
+ /// match bad_blocks.check(90..120) {
+ /// BlockStatus::Unacknowledged(range) => {
+ /// assert_eq!(range.start, 100);
+ /// assert_eq!(range.end, 110);
+ /// pr_warn!("Found unacknowledged bad blocks: {}-{}", range.start, range.end - 1);
+ /// },
+ /// _ => panic!("Expected bad blocks"),
+ /// }
+ ///
+ /// // Check range that doesn't overlap
+ /// assert!(matches!(bad_blocks.check(0..50), BlockStatus::None));
+ /// # Ok::<(), kernel::error::Error>(())
+ /// ```
+ ///
+ /// Handling different acknowledgment states:
+ ///
+ /// ```rust
+ /// # use kernel::block::badblocks::{BadBlocks, BlockStatus};
+ /// # use kernel::prelude::*;
+ /// let bad_blocks = KBox::pin_init(BadBlocks::new(true), GFP_KERNEL)?;
+ ///
+ /// // Add both acknowledged and unacknowledged bad blocks
+ /// bad_blocks.set_bad(100..105, true)?; // Acknowledged
+ /// bad_blocks.set_bad(200..205, false)?; // Unacknowledged
+ ///
+ /// match bad_blocks.check(95..105) {
+ /// BlockStatus::Acknowledged(range) => {
+ /// pr_info!("Acknowledged bad blocks found, can potentially remap: {:?}", range);
+ /// // Continue with remapping logic
+ /// },
+ /// BlockStatus::Unacknowledged(range) => {
+ /// pr_err!("Unacknowledged bad blocks found, requires attention: {:?}", range);
+ /// // Handle as error condition
+ /// },
+ /// BlockStatus::None => {
+ /// // Safe to proceed with I/O
+ /// },
+ /// }
+ /// # Ok::<(), kernel::error::Error>(())
+ /// ```
+ ///
+ /// Safe I/O operation pattern:
+ ///
+ /// ```rust
+ /// # use kernel::block::badblocks::{BadBlocks, BlockStatus};
+ /// # use kernel::prelude::*;
+ /// # use core::ops::RangeBounds;
+ /// # fn perform_sector_read(range: impl RangeBounds<u64>) -> Result<()> { Ok(()) }
+ /// fn safe_read_sectors(
+ /// bad_blocks: &BadBlocks,
+ /// range: impl RangeBounds<u64> + Clone
+ /// ) -> Result<()> {
+ /// // Check for bad blocks before attempting I/O
+ /// match bad_blocks.check(range.clone()) {
+ /// BlockStatus::None => {
+ /// // Safe to proceed with I/O operation - convert range to
+ /// // start/count for legacy function.
+ /// perform_sector_read(range)
+ /// },
+ /// BlockStatus::Acknowledged(range) => {
+ /// pr_warn!("I/O intersects acknowledged bad blocks: {:?}", range);
+ /// // Potentially remap or skip bad sectors
+ /// Err(EIO)
+ /// },
+ /// BlockStatus::Unacknowledged(range) => {
+ /// pr_err!("I/O intersects unacknowledged bad blocks: {:?}", range);
+ /// // Treat as serious error
+ /// Err(EIO)
+ /// },
+ /// }
+ /// }
+ /// # Ok::<(), kernel::error::Error>(())
+ /// ```
+ pub fn check(&self, range: impl RangeBounds<u64>) -> BlockStatus {
+ let mut first_bad = 0;
+ let mut bad_count = 0;
+ let range = Self::range(range);
+
+ // SAFETY: By type invariant `self.blocks` is valid. `first_bad` and
+ // `bad_count` are valid mutable references The C function
+ // `badblocks_check` handles synchronization internally.
+ let ret = unsafe {
+ bindings::badblocks_check(
+ self.blocks.get(),
+ range.start,
+ range.end - range.start,
+ &mut first_bad,
+ &mut bad_count,
+ )
+ };
+
+ match ret {
+ 0 => BlockStatus::None,
+ 1 => BlockStatus::Acknowledged(first_bad..first_bad + bad_count),
+ -1 => BlockStatus::Unacknowledged(first_bad..first_bad + bad_count),
+ _ => {
+ debug_assert!(false, "Illegal return value from `badblocks_check`");
+ BlockStatus::None
+ }
+ }
+ }
+
+ /// Formats bad blocks information into a human-readable string.
+ ///
+ /// Exports the current bad blocks table to a text representation suitable
+ /// for display via sysfs. The output format shows each bad block range
+ /// with sector numbers and acknowledgment status.
+ ///
+ /// # Parameters
+ ///
+ /// - `page` - A page-sized buffer to write the formatted output into.
+ /// - `show_unacknowledged` - Whether to include unacknowledged bad blocks in output.
+ /// - `true`: Shows both acknowledged and unacknowledged bad blocks
+ /// - `false`: Shows only acknowledged bad blocks
+ ///
+ /// # Output Format
+ ///
+ /// The output consists of space-separated entries, each representing a bad block range:
+ /// - Format: `start_sector length [acknowledgment_status]`
+ /// - Acknowledged blocks: Just sector and length (e.g., "100 10")
+ /// - Unacknowledged blocks: Sector, length, and "u" suffix (e.g., "200 5 u")
+ ///
+ /// # Returns
+ ///
+ /// Returns the number of bytes written to the buffer, or a negative value on error.
+ /// The returned length can be used to extract the valid portion of the buffer.
+ ///
+ /// # Examples
+ ///
+ /// Basic usage:
+ ///
+ /// ```rust
+ /// # use kernel::block::badblocks::{BadBlocks, BlockStatus};
+ /// # use kernel::prelude::*;
+ /// # use kernel::page::PAGE_SIZE;
+ /// let bad_blocks = KBox::pin_init(BadBlocks::new(true), GFP_KERNEL)?;
+ /// let mut page = [0u8; PAGE_SIZE];
+ ///
+ /// // Add some bad blocks
+ /// bad_blocks.set_bad(100..110, true)?; // Acknowledged
+ /// bad_blocks.set_bad(200..205, false)?; // Unacknowledged
+ ///
+ /// // Show all bad blocks (including unacknowledged)
+ /// let len = bad_blocks.show(&mut page, true);
+ /// if len > 0 {
+ /// let output = core::str::from_utf8(&page[..len as usize]).unwrap_or("<invalid utf8>");
+ /// pr_info!("Bad blocks: {}", output);
+ /// // Output might be: "100 10 200 5 u"
+ /// }
+ /// # Ok::<(), kernel::error::Error>(())
+ /// ```
+ pub fn show(&self, page: &mut [u8; PAGE_SIZE], show_unacknowledged: bool) -> isize {
+ // SAFETY: By type invariant `self.blocks` is valid. The C function
+ // `badblocks_show` handles synchronization internally.
+ // `page.as_mut_ptr()` returns a valid pointer to a PAGE_SIZE buffer.
+ // The C function will not write beyond the provided buffer size.
+ unsafe {
+ bindings::badblocks_show(
+ self.blocks.get(),
+ page.as_mut_ptr(),
+ if show_unacknowledged { 1 } else { 0 },
+ )
+ }
+ }
+}
+
+#[pinned_drop]
+impl PinnedDrop for BadBlocks {
+ fn drop(self: Pin<&mut Self>) {
+ // SAFETY: We do not move out of `self` before it is dropped.
+ let this = unsafe { self.get_unchecked_mut() };
+ // SAFETY: By type invariant `this.blocks` is valid. `badblocks_exit` is
+ // safe to call during destruction and will properly clean up allocated
+ // resources.
+ unsafe { bindings::badblocks_exit(this.blocks.get()) };
+ }
+}
+
+// SAFETY: `BadBlocks` can be safely dropped from other threads.
+unsafe impl Send for BadBlocks {}
+
+// SAFETY: All `BadBlocks` methods use internal synchronization.
+unsafe impl Sync for BadBlocks {}
+
+/// Status of a sector range after checking for bad blocks.
+///
+/// This enum represents the result of checking a sector range against the bad blocks
+/// table. It distinguishes between ranges with no bad blocks, ranges with acknowledged
+/// bad blocks, and ranges with unacknowledged bad blocks.
+///
+/// # Examples
+///
+/// ```rust
+/// # use kernel::block::badblocks::{BadBlocks, BlockStatus};
+/// # use kernel::prelude::*;
+/// # use core::ops::{Range, RangeBounds};
+/// # fn perform_io(range: impl RangeBounds<u64>) -> Result<()> { Ok(()) }
+/// # fn remap_and_retry(io_range: impl RangeBounds<u64>, bad_range: Range<u64>)
+/// # -> Result<()> { Ok(()) }
+/// fn handle_io_request(bad_blocks: &BadBlocks, range: impl RangeBounds<u64> + Clone)
+/// -> Result<()>
+/// {
+/// match bad_blocks.check(range.clone()) {
+/// BlockStatus::None => {
+/// // Safe to proceed with I/O - convert range to start/count for legacy function
+/// perform_io(range)
+/// },
+/// BlockStatus::Acknowledged(bad_range) => {
+/// pr_warn!("I/O overlaps acknowledged bad blocks: {:?}", bad_range);
+/// // Attempt remapping or alternative strategy
+/// remap_and_retry(range, bad_range)
+/// },
+/// BlockStatus::Unacknowledged(bad_range) => {
+/// pr_err!("I/O overlaps unacknowledged bad blocks: {:?}", bad_range);
+/// // Treat as serious error
+/// Err(EIO)
+/// },
+/// }
+/// }
+/// # Ok::<(), kernel::error::Error>(())
+/// ```
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub enum BlockStatus {
+ /// No bad blocks found in the checked range.
+ None,
+ /// The range contains acknowledged bad blocks.
+ ///
+ /// The contained range represents the first bad block
+ /// range encountered.
+ Acknowledged(Range<u64>),
+ /// The range contains unacknowledged bad blocks that need attention.
+ ///
+ /// The contained range represents the boundaries of the first bad block
+ /// range encountered.
+ Unacknowledged(Range<u64>),
+}
--
2.51.2
^ permalink raw reply related
* [PATCH v2 13/83] block: rust: add `command` getter to `Request`
From: Andreas Hindborg @ 2026-06-09 19:07 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux, Andreas Hindborg
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
From: Andreas Hindborg <a.hindborg@samsung.com>
Add a method to extract the command operation code from a request. The
command is obtained by masking the lower bits of `cmd_flags` as defined by
`REQ_OP_BITS`. This allows Rust block drivers to determine the type of
operation being requested.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/kernel/block/mq/request.rs | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request.rs
index 98e54f0586d1..19bdf17de166 100644
--- a/rust/kernel/block/mq/request.rs
+++ b/rust/kernel/block/mq/request.rs
@@ -116,6 +116,13 @@ pub(crate) unsafe fn aref_from_raw(ptr: *mut bindings::request) -> ARef<Self> {
unsafe { ARef::from_raw(NonNull::new_unchecked(ptr.cast())) }
}
+ /// Get the command identifier for the request
+ pub fn command(&self) -> u32 {
+ use core::ops::BitAnd;
+ // SAFETY: By C API contract and type invariant, `cmd_flags` is valid for read
+ unsafe { (*self.0.get()).cmd_flags }.bitand((1u32 << bindings::REQ_OP_BITS) - 1)
+ }
+
/// Complete the request by scheduling `Operations::complete` for
/// execution.
///
--
2.51.2
^ permalink raw reply related
* [PATCH v2 12/83] block: rust: introduce `kernel::block::bio` module
From: Andreas Hindborg @ 2026-06-09 19:07 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add Rust abstractions for working with `struct bio`, the core IO command
descriptor for the block layer.
The `Bio` type wraps `struct bio` and provides safe access to the IO
vector describing the data buffers associated with the IO command. The
data buffers are represented as a vector of `Segment`s, where each
segment is a contiguous region of physical memory backed by `Page`.
The `BioSegmentIterator` provides iteration over segments in a single
bio, while `BioIterator` allows traversing a chain of bios. The
`Segment` type offers methods for copying data to and from pages, as
well as zeroing page contents, which are the fundamental operations
needed by block device drivers to process IO requests.
The `Request` type is extended with methods to access the bio chain
associated with a request, allowing drivers to iterate over all data
buffers that need to be processed.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/helpers/blk.c | 8 +
rust/kernel/block.rs | 1 +
rust/kernel/block/bio.rs | 147 ++++++++++++++
rust/kernel/block/bio/vec.rs | 411 ++++++++++++++++++++++++++++++++++++++++
rust/kernel/block/mq/request.rs | 49 +++++
rust/kernel/page.rs | 2 +-
6 files changed, 617 insertions(+), 1 deletion(-)
diff --git a/rust/helpers/blk.c b/rust/helpers/blk.c
index 20c512e46a7a..6a70e1306a3a 100644
--- a/rust/helpers/blk.c
+++ b/rust/helpers/blk.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
+#include <linux/bio.h>
#include <linux/blk-mq.h>
#include <linux/blkdev.h>
@@ -12,3 +13,10 @@ __rust_helper struct request *rust_helper_blk_mq_rq_from_pdu(void *pdu)
{
return blk_mq_rq_from_pdu(pdu);
}
+
+__rust_helper void rust_helper_bio_advance_iter_single(const struct bio *bio,
+ struct bvec_iter *iter,
+ unsigned int bytes)
+{
+ bio_advance_iter_single(bio, iter, bytes);
+}
diff --git a/rust/kernel/block.rs b/rust/kernel/block.rs
index b120e83d9425..eb512dad031b 100644
--- a/rust/kernel/block.rs
+++ b/rust/kernel/block.rs
@@ -2,6 +2,7 @@
//! Types for working with the block layer.
+pub mod bio;
pub mod mq;
/// Bit mask for masking out the sector index in a page.
diff --git a/rust/kernel/block/bio.rs b/rust/kernel/block/bio.rs
new file mode 100644
index 000000000000..af84f94a85fe
--- /dev/null
+++ b/rust/kernel/block/bio.rs
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Types for working with the bio layer.
+//!
+//! C header: [`include/linux/blk_types.h`](srctree/include/linux/blk_types.h)
+
+use crate::{
+ fmt,
+ types::Opaque, //
+};
+use core::{
+ marker::PhantomData,
+ pin::Pin,
+ ptr::NonNull, //
+};
+
+mod vec;
+
+pub use vec::{
+ BioSegmentIterator,
+ Segment, //
+};
+
+/// A block device IO descriptor (`struct bio`).
+///
+/// A `Bio` is the main unit of IO for the block layer. It describes an IO command and associated
+/// data buffers.
+///
+/// The data buffers associated with a `Bio` are represented by a vector of [`Segment`]s. These
+/// segments represent physically contiguous regions of memory. The memory is represented by
+/// [`Page`] descriptors internally.
+///
+/// The vector of [`Segment`]s can be iterated by obtaining a [`SegmentIterator`].
+#[repr(transparent)]
+pub struct Bio(Opaque<bindings::bio>);
+
+impl Bio {
+ /// Returns an iterator over segments in this `Bio`. Does not consider
+ /// segments of other bios in this bio chain.
+ #[inline(always)]
+ pub fn segment_iter(self: Pin<&mut Self>) -> BioSegmentIterator<'_> {
+ BioSegmentIterator::new(self)
+ }
+
+ /// Get the number of io vectors in this bio.
+ fn io_vec_count(&self) -> u16 {
+ // SAFETY: By the type invariant of `Bio` and existence of `&self`,
+ // `self.0` is valid for read.
+ unsafe { (*self.0.get()).bi_vcnt }
+ }
+
+ /// Get slice referencing the `bio_vec` array of this bio
+ #[inline(always)]
+ fn io_vec(&self) -> NonNull<bindings::bio_vec> {
+ let this = self.0.get();
+
+ // SAFETY: By the type invariant of `Bio` and existence of `&self`,
+ // `this` is valid for read.
+ let vec_ptr = unsafe { (*this).bi_io_vec };
+
+ // SAFETY: By C API contract, bi_io_vec is always set, even if bi_vcnt
+ // is zero.
+ unsafe { NonNull::new_unchecked(vec_ptr) }
+ }
+
+ /// Return a copy of the `bvec_iter` for this `Bio`. This iterator always
+ /// indexes to a valid `bio_vec` entry.
+ #[inline(always)]
+ fn raw_iter(&self) -> bindings::bvec_iter {
+ // SAFETY: By the type invariant of `Bio` and existence of `&self`,
+ // `self` is valid for read.
+ unsafe { (*self.0.get()).bi_iter }
+ }
+
+ /// Create an instance of `Bio` from a raw pointer.
+ ///
+ /// # Safety
+ ///
+ /// Caller must ensure that the `ptr` is valid for use as a reference to
+ /// `Bio` for the duration of `'a`.
+ #[inline(always)]
+ pub(crate) unsafe fn from_raw<'a>(ptr: *mut bindings::bio) -> Option<&'a Self> {
+ Some(
+ // SAFETY: by the safety requirement of this function, `ptr` is
+ // valid for read for the duration of the returned lifetime
+ unsafe { &*NonNull::new(ptr)?.as_ptr().cast::<Bio>() },
+ )
+ }
+
+ /// Create an instance of `Bio` from a raw pointer.
+ ///
+ /// # Safety
+ ///
+ /// Caller must ensure that the `ptr` is valid for use as a unique reference
+ /// to `Bio` for the duration of `'a`.
+ #[inline(always)]
+ pub(crate) unsafe fn from_raw_mut<'a>(ptr: *mut bindings::bio) -> Option<Pin<&'a mut Self>> {
+ // SAFETY: by the safety requirement of this function, `ptr` is
+ // valid for read for the duration of the returned lifetime.
+ let bio = unsafe { &mut *NonNull::new(ptr)?.as_ptr().cast::<Bio>() };
+
+ // SAFETY: `bindings::bio` is pinned.
+ Some(unsafe { Pin::new_unchecked(bio) })
+ }
+}
+
+impl fmt::Display for Bio {
+ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+ let iter = self.raw_iter();
+ write!(
+ f,
+ "Bio({:?}, vcnt: {}, idx: {}, size: 0x{:x}, completed: 0x{:x})",
+ self.0.get(),
+ self.io_vec_count(),
+ iter.bi_idx,
+ iter.bi_size,
+ iter.bi_bvec_done
+ )
+ }
+}
+
+/// An iterator over `Bio` in a bio chain, yielding `&mut Bio`.
+///
+/// # Invariants
+///
+/// `bio` must be either `None` or be valid for use as a `&mut Bio`.
+pub struct BioIterator<'a> {
+ pub(crate) bio: Option<NonNull<Bio>>,
+ pub(crate) _p: PhantomData<&'a ()>,
+}
+
+impl<'a> core::iter::Iterator for BioIterator<'a> {
+ type Item = Pin<&'a mut Bio>;
+
+ #[inline(always)]
+ fn next(&mut self) -> Option<Pin<&'a mut Bio>> {
+ let mut current = self.bio.take()?;
+ // SAFETY: By the type invariant of `Bio` and type invariant on `Self`,
+ // `current` is valid for use as a unique reference.
+ let next = unsafe { (*current.as_ref().0.get()).bi_next };
+ self.bio = NonNull::new(next.cast());
+ // SAFETY:
+ // - By type invariant, `bio` is valid for use as a reference.
+ // - `bindings::bio` is pinned.
+ Some(unsafe { Pin::new_unchecked(current.as_mut()) })
+ }
+}
diff --git a/rust/kernel/block/bio/vec.rs b/rust/kernel/block/bio/vec.rs
new file mode 100644
index 000000000000..99ab164d4038
--- /dev/null
+++ b/rust/kernel/block/bio/vec.rs
@@ -0,0 +1,411 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Types for working with `struct bio_vec` IO vectors
+//!
+//! C header: [`include/linux/bvec.h`](../../include/linux/bvec.h)
+
+use super::Bio;
+use crate::{
+ error::{
+ code,
+ Result, //
+ },
+ page::{
+ Page,
+ SafePage,
+ PAGE_SIZE, //
+ },
+ prelude::*, //
+};
+use core::{
+ fmt,
+ mem::ManuallyDrop, //
+};
+
+/// A segment of an IO request.
+///
+/// [`Segment`] represents a contiguous range of physical memory addresses of an IO request. A
+/// segment has a offset and a length, representing the amount of data that needs to be processed.
+/// Processing the data increases the offset and reduces the length.
+///
+/// The data buffer of a [`Segment`] is borrowed from a `Bio`.
+///
+/// # Implementation details
+///
+/// In the context of user driven block IO, the pages backing a [`Segment`] are often mapped to user
+/// space concurrently with the IO operation. Further, the page backing a `Segment` may be part of
+/// multiple IO operations, if user space decides to issue multiple concurrent IO operations
+/// involving the same page. Thus, the data represented by a [`Segment`] must always be assumed to
+/// be subject to racy writes.
+///
+/// A [`Segemnt`] is a wrapper around a `strutct bio_vec`.
+///
+/// # Invariants
+///
+/// `bio_vec` must always be initialized and valid for read and write
+pub struct Segment<'a> {
+ bio_vec: bindings::bio_vec,
+ _marker: core::marker::PhantomData<&'a ()>,
+}
+
+impl Segment<'_> {
+ /// Get he length of the segment in bytes.
+ #[inline(always)]
+ pub fn len(&self) -> u32 {
+ self.bio_vec.bv_len
+ }
+
+ /// Returns true if the length of the segment is 0.
+ #[inline(always)]
+ pub fn is_empty(&self) -> bool {
+ self.len() == 0
+ }
+
+ /// Get the offset field of the `bio_vec`.
+ #[inline(always)]
+ pub fn offset(&self) -> usize {
+ self.bio_vec.bv_offset as usize
+ }
+
+ /// Advance the offset of the segment.
+ ///
+ /// If `count` is greater than the remaining size of the segment, an error
+ /// is returned.
+ pub fn advance(&mut self, count: u32) -> Result {
+ if self.len() < count {
+ return Err(code::EINVAL);
+ }
+
+ self.bio_vec.bv_offset += count;
+ self.bio_vec.bv_len -= count;
+ Ok(())
+ }
+
+ /// Copy data of this segment into `dst_page`.
+ ///
+ /// Copies data from the current offset to the next page boundary. That is `PAGE_SIZE -
+ /// (self.offeset() % PAGE_SIZE)` bytes of data. Data is placed at offset `self.offset()` in the
+ /// target page. This call will advance offset and reduce length of `self`.
+ ///
+ /// Returns the number of bytes copied.
+ #[inline(always)]
+ pub fn copy_to_page(&mut self, dst_page: Pin<&mut SafePage>, dst_offset: usize) -> usize {
+ // SAFETY: We are not moving out of `dst_page`.
+ let dst_page = unsafe { Pin::into_inner_unchecked(dst_page) };
+ let src_offset = self.offset() % PAGE_SIZE;
+ debug_assert!(dst_offset <= PAGE_SIZE);
+ let length = (PAGE_SIZE - src_offset)
+ .min(self.len() as usize)
+ .min(PAGE_SIZE - dst_offset);
+ let page_idx = self.offset() / PAGE_SIZE;
+
+ // SAFETY: self.bio_vec is valid and thus bv_page must be a valid
+ // pointer to a `struct page` array.
+ let src_page = unsafe { Page::from_raw(self.bio_vec.bv_page.add(page_idx)) };
+
+ src_page
+ .with_pointer_into_page(src_offset, length, |src| {
+ // SAFETY:
+ // - If `with_pointer_into_page` calls this closure, it has performed bounds
+ // checking and guarantees that `src` is valid for `length` bytes.
+ // - Any other operations to `src` are atomic or user space operations.
+ // - We have exclusive ownership of `dst_page` and thus this write will not race.
+ unsafe { dst_page.write_bytewise_atomic(src, dst_offset, length) }
+ })
+ .expect("Assertion failure, bounds check failed.");
+
+ self.advance(length as u32)
+ .expect("Assertion failure, bounds check failed.");
+
+ length
+ }
+
+ /// Copy data to the current page of this segment from `src_page`.
+ ///
+ /// Copies `PAGE_SIZE - (self.offset() % PAGE_SIZE` bytes of data from `src_page` to this
+ /// segment starting at `self.offset()` from offset `self.offset() % PAGE_SIZE`. This call
+ /// will advance offset and reduce length of `self`.
+ ///
+ /// Returns the number of bytes copied.
+ pub fn copy_from_page(&mut self, src_page: &SafePage, src_offset: usize) -> usize {
+ let dst_offset = self.offset() % PAGE_SIZE;
+ debug_assert!(src_offset <= PAGE_SIZE);
+ let length = (PAGE_SIZE - dst_offset)
+ .min(self.len() as usize)
+ .min(PAGE_SIZE - src_offset);
+ let page_idx = self.offset() / PAGE_SIZE;
+
+ // SAFETY: self.bio_vec is valid and thus bv_page must be a valid
+ // pointer to a `struct page`.
+ let dst_page = unsafe { Page::from_raw(self.bio_vec.bv_page.add(page_idx)) };
+
+ dst_page
+ .with_pointer_into_page(dst_offset, length, |dst| {
+ // SAFETY:
+ // - If `with_pointer_into_page` calls this closure, then it has performed bounds
+ // checks and guarantees that `dst` is valid for `length` bytes.
+ // - Any other operations to `dst` are atomic or user space operations.
+ // - Since we have a shared reference to `src_page`, the read cannot race with any
+ // writes to `src_page`.
+ unsafe { src_page.read_bytewise_atomic(dst, src_offset, length) }
+ })
+ .expect("Assertion failure, bounds check failed.");
+
+ self.advance(length as u32)
+ .expect("Assertion failure, bounds check failed.");
+
+ length
+ }
+
+ /// Copy zeroes to the current page of this segment.
+ ///
+ /// Copies `PAGE_SIZE - (self.offset() % PAGE_SIZE` bytes of data to this
+ /// segment starting at `self.offset()`. This call will advance offset and reduce length of
+ /// `self`.
+ ///
+ /// Returns the number of bytes written to this segment.
+ pub fn zero_page(&mut self) -> usize {
+ let offset = self.offset() % PAGE_SIZE;
+ let length = (PAGE_SIZE - offset).min(self.len() as usize);
+ let page_idx = self.offset() / PAGE_SIZE;
+
+ // SAFETY: self.bio_vec is valid and thus bv_page must be a valid
+ // pointer to a `struct page`. We do not own the page, but we prevent
+ // drop by wrapping the `Page` in `ManuallyDrop`.
+ let dst_page =
+ ManuallyDrop::new(unsafe { Page::from_raw(self.bio_vec.bv_page.add(page_idx)) });
+
+ // SAFETY: TODO: This might race with user space writes.
+ unsafe { dst_page.fill_zero_raw(offset, length) }
+ .expect("Assertion failure, bounds check failed.");
+
+ self.advance(length as u32)
+ .expect("Assertion failure, bounds check failed.");
+
+ length
+ }
+}
+
+impl core::fmt::Display for Segment<'_> {
+ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+ write!(
+ f,
+ "Segment {:?} len: {}, offset: {}",
+ self.bio_vec.bv_page, self.bio_vec.bv_len, self.bio_vec.bv_offset
+ )
+ }
+}
+
+/// An iterator over `Segment`
+///
+/// The iterator takes a copy of the bio's `bvec_iter` when it is created and
+/// advances that copy as it yields [`Segment`]s, leaving the `Bio` untouched. A
+/// `struct bvec_iter` is a standalone cursor into the bio's `bio_vec` array: as
+/// described in the kernel's [immutable biovecs] documentation, the `bio_vec`
+/// array is immutable once the bio is submitted and all of the position that
+/// changes while iterating is held in the `bvec_iter`, not in the array. Such an
+/// iterator can therefore be freely copied and moved, and advancing one copy
+/// affects neither the `Bio` nor any other copy of the iterator.
+///
+/// [immutable biovecs]: srctree/Documentation/block/biovecs.rst
+///
+/// # Invariants
+///
+/// If `iter.bi_size` > 0, `iter` must always index a valid `bio_vec` in `bio.io_vec()`.
+pub struct BioSegmentIterator<'a> {
+ bio: Pin<&'a mut Bio>,
+ iter: bindings::bvec_iter,
+}
+
+impl<'a> BioSegmentIterator<'a> {
+ /// Create a new segment iterator for iterating the segments of `bio`. The
+ /// iterator starts at the beginning of `bio`.
+ #[inline(always)]
+ pub(crate) fn new(bio: Pin<&'a mut Bio>) -> BioSegmentIterator<'a> {
+ let iter = bio.raw_iter();
+
+ // INVARIANT: `bio.raw_iter()` returns an index that indexes into a valid
+ // `bio_vec` in `bio.io_vec()`.
+ Self { bio, iter }
+ }
+
+ // The accessors in this implementation block are modelled after C side
+ // macros and static functions `bvec_iter_*` and `mp_bvec_iter_*` from
+ // bvec.h.
+
+ /// Construct a `bio_vec` from the current iterator state.
+ ///
+ /// This will return a `bio_vec`of size <= PAGE_SIZE
+ ///
+ /// # Safety
+ ///
+ /// Caller must ensure that `self.iter.bi_size` > 0 before calling this
+ /// method.
+ unsafe fn io_vec(&self) -> bindings::bio_vec {
+ debug_assert!(self.iter.bi_size > 0);
+ // SAFETY: By safety requirement of this function `self.iter.bi_size` is
+ // greater than 0.
+ unsafe {
+ bindings::bio_vec {
+ bv_page: self.page(),
+ bv_len: self.len(),
+ bv_offset: self.offset(),
+ }
+ }
+ }
+
+ /// Get the currently indexed `bio_vec` entry.
+ ///
+ /// # Safety
+ ///
+ /// Caller must ensure that `self.iter.bi_size` > 0 before calling this
+ /// method.
+ #[inline(always)]
+ unsafe fn bvec(&self) -> &bindings::bio_vec {
+ debug_assert!(self.iter.bi_size > 0);
+ // SAFETY: By the safety requirement of this function and the type
+ // invariant of `Self`, `self.iter.bi_idx` indexes into a valid
+ // `bio_vec`
+ unsafe { self.bio.io_vec().offset(self.iter.bi_idx as isize).as_ref() }
+ }
+
+ /// Get the as u32currently indexed page, indexing into pages of order >= 0.
+ ///
+ /// # Safety
+ ///
+ /// Caller must ensure that `self.iter.bi_size` > 0 before calling this
+ /// method.
+ #[inline(always)]
+ unsafe fn page(&self) -> *mut bindings::page {
+ debug_assert!(self.iter.bi_size > 0);
+ // SAFETY: By C API contract, the following offset cannot exceed pages
+ // allocated to this bio.
+ unsafe { self.mp_page().add(self.mp_page_idx()) }
+ }
+
+ /// Get the remaining bytes in the current page. Never more than PAGE_SIZE.
+ ///
+ /// # Safety
+ ///
+ /// Caller must ensure that `self.iter.bi_size` > 0 before calling this
+ /// method.
+ #[inline(always)]
+ unsafe fn len(&self) -> u32 {
+ debug_assert!(self.iter.bi_size > 0);
+ // SAFETY: By safety requirement of this function `self.iter.bi_size` is
+ // greater than 0.
+ unsafe {
+ self.mp_len()
+ .min((bindings::PAGE_SIZE as u32) - self.offset())
+ }
+ }
+
+ /// Get the offset from the last page boundary in the currently indexed
+ /// `bio_vec` entry. Never more than PAGE_SIZE.
+ ///
+ /// # Safety
+ ///
+ /// Caller must ensure that `self.iter.bi_size` > 0 before calling this
+ /// method.
+ #[inline(always)]
+ unsafe fn offset(&self) -> u32 {
+ debug_assert!(self.iter.bi_size > 0);
+ // SAFETY: By safety requirement of this function `self.iter.bi_size` is
+ // greater than 0.
+ unsafe { self.mp_offset() % (bindings::PAGE_SIZE as u32) }
+ }
+
+ /// Return the first page of the currently indexed `bio_vec` entry. This
+ /// might be a multi-page entry, meaning that page might have order > 0.
+ ///
+ /// # Safety
+ ///
+ /// Caller must ensure that `self.iter.bi_size` > 0 before calling this
+ /// method.
+ #[inline(always)]
+ unsafe fn mp_page(&self) -> *mut bindings::page {
+ debug_assert!(self.iter.bi_size > 0);
+ // SAFETY: By safety requirement of this function `self.iter.bi_size` is
+ // greater than 0.
+ unsafe { self.bvec().bv_page }
+ }
+
+ /// Get the offset in whole pages into the currently indexed `bio_vec`. This
+ /// can be more than 0 is the page has order > 0.
+ ///
+ /// # Safety
+ ///
+ /// Caller must ensure that `self.iter.bi_size` > 0 before calling this
+ /// method.
+ #[inline(always)]
+ unsafe fn mp_page_idx(&self) -> usize {
+ debug_assert!(self.iter.bi_size > 0);
+ // SAFETY: By safety requirement of this function `self.iter.bi_size` is
+ // greater than 0.
+ (unsafe { self.mp_offset() } / (bindings::PAGE_SIZE as u32)) as usize
+ }
+
+ /// Get the offset in the currently indexed `bio_vec` multi-page entry. This
+ /// can be more than `PAGE_SIZE` if the page has order > 0.
+ ///
+ /// # Safety
+ ///
+ /// Caller must ensure that `self.iter.bi_size` > 0 before calling this
+ /// method.
+ #[inline(always)]
+ unsafe fn mp_offset(&self) -> u32 {
+ debug_assert!(self.iter.bi_size > 0);
+ // SAFETY: By safety requirement of this function `self.iter.bi_size` is
+ // greater than 0.
+ unsafe { self.bvec().bv_offset + self.iter.bi_bvec_done }
+ }
+
+ /// Get the number of remaining bytes for the currently indexed `bio_vec`
+ /// entry. Can be more than PAGE_SIZE for `bio_vec` entries with pages of
+ /// order > 0.
+ ///
+ /// # Safety
+ ///
+ /// Caller must ensure that `self.iter.bi_size` > 0 before calling this
+ /// method.
+ #[inline(always)]
+ unsafe fn mp_len(&self) -> u32 {
+ debug_assert!(self.iter.bi_size > 0);
+ // SAFETY: By safety requirement of this function `self.iter.bi_size` is
+ // greater than 0.
+ self.iter
+ .bi_size
+ .min(unsafe { self.bvec().bv_len } - self.iter.bi_bvec_done)
+ }
+}
+
+impl<'a> core::iter::Iterator for BioSegmentIterator<'a> {
+ type Item = Segment<'a>;
+
+ #[inline(always)]
+ fn next(&mut self) -> Option<Self::Item> {
+ if self.iter.bi_size == 0 {
+ return None;
+ }
+
+ // SAFETY: We checked that `self.iter.bi_size` > 0 above.
+ let bio_vec_ret = unsafe { self.io_vec() };
+
+ // SAFETY: By existence of reference `&bio`, `bio.0` contains a valid
+ // `struct bio`. By type invariant of `BioSegmentItarator` `self.iter`
+ // indexes into a valid `bio_vec` entry. By C API contracit, `bv_len`
+ // does not exceed the size of the bio.
+ unsafe {
+ bindings::bio_advance_iter_single(
+ self.bio.0.get(),
+ &raw mut self.iter,
+ bio_vec_ret.bv_len,
+ )
+ };
+
+ Some(Segment {
+ bio_vec: bio_vec_ret,
+ _marker: core::marker::PhantomData,
+ })
+ }
+}
diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request.rs
index 0b14f584c9d9..98e54f0586d1 100644
--- a/rust/kernel/block/mq/request.rs
+++ b/rust/kernel/block/mq/request.rs
@@ -33,9 +33,15 @@
use core::{
ffi::c_void,
marker::PhantomData,
+ pin::Pin,
ptr::NonNull, //
};
+use crate::block::bio::{
+ Bio,
+ BioIterator, //
+};
+
/// A wrapper around a blk-mq [`struct request`]. This represents an IO request.
///
/// # Lifetime
@@ -127,6 +133,49 @@ pub fn complete(this: ARef<Self>) {
}
}
+ /// Get a reference to the first [`Bio`] in this request.
+ #[inline(always)]
+ pub fn bio(&self) -> Option<&Bio> {
+ // SAFETY: By type invariant of `Self`, `self.0` is valid and the deref
+ // is safe.
+ let ptr = unsafe { (*self.0.get()).bio };
+ // SAFETY: By C API contract, if `bio` is not null it will have a
+ // positive refcount at least for the duration of the lifetime of
+ // `&self`.
+ unsafe { Bio::from_raw(ptr) }
+ }
+
+ /// Get a mutable reference to the first [`Bio`] in this request.
+ #[inline(always)]
+ pub fn bio_mut(self: Pin<&mut Self>) -> Option<Pin<&mut Bio>> {
+ // SAFETY: By type invariant of `Self`, `self.0` is valid and the deref
+ // is safe.
+ let ptr = unsafe { (*self.0.get()).bio };
+ // SAFETY: By C API contract, if `bio` is not null it will have a
+ // positive refcount at least for the duration of the lifetime of
+ // `&mut self`.
+ unsafe { Bio::from_raw_mut(ptr) }
+ }
+
+ /// Get an iterator over all bio structures in this request.
+ #[inline(always)]
+ pub fn bio_iter_mut<'a>(self: &'a mut Owned<Self>) -> BioIterator<'a> {
+ // INVARIANT: By C API contract, if the bio pointer is not null, it is a valid `struct bio`.
+ // `NonNull::new` will return `None` if the pointer is null.
+ BioIterator {
+ // SAFETY: By type invariant `self.0` is a valid `struct request`.
+ bio: NonNull::new(unsafe { (*self.0.get()).bio.cast() }),
+ _p: PhantomData,
+ }
+ }
+
+ /// Get the target sector for the request.
+ #[inline(always)]
+ pub fn sector(&self) -> usize {
+ // SAFETY: By type invariant of `Self`, `self.0` is valid and live.
+ unsafe { (*self.0.get()).__sector as usize }
+ }
+
/// Return a pointer to the [`RequestDataWrapper`] stored in the private area
/// of the request structure.
///
diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs
index e4585e1dba0c..a3473dabf587 100644
--- a/rust/kernel/page.rs
+++ b/rust/kernel/page.rs
@@ -282,7 +282,7 @@ fn with_page_mapped<T>(&self, f: impl FnOnce(*mut u8) -> T) -> T {
/// different addresses. However, even if the addresses are different, the underlying memory is
/// still the same for these purposes (e.g., it's still a data race if they both write to the
/// same underlying byte at the same time).
- fn with_pointer_into_page<T>(
+ pub(crate) fn with_pointer_into_page<T>(
&self,
off: usize,
len: usize,
--
2.51.2
^ permalink raw reply related
* [PATCH v2 11/83] block: rnull: add timer completion mode
From: Andreas Hindborg @ 2026-06-09 19:07 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add a timer completion mode to `rnull`. This will complete requests after a
specified time has elapsed. To use this mode of operation, set `irqmode` to
`2` and write a timeout in nanoseconds to `completion_nsec`.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
drivers/block/rnull/configfs.rs | 34 ++++++++++++++++++++--
drivers/block/rnull/rnull.rs | 63 ++++++++++++++++++++++++++++++++++++++---
2 files changed, 90 insertions(+), 7 deletions(-)
diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index fd309fc17e66..83b474f6da60 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -25,11 +25,15 @@
kstrtobool_bytes,
CString, //
},
- sync::Mutex, //
+ sync::Mutex,
+ time, //
};
use macros::{
+ configfs_attribute,
configfs_simple_bool_field,
- configfs_simple_field, //
+ configfs_simple_field,
+ show_field,
+ store_number_with_power_check, //
};
mod macros;
@@ -56,7 +60,7 @@ impl AttributeOperations<0> for Config {
fn show(_this: &Config, page: &mut [u8; PAGE_SIZE]) -> Result<usize> {
let mut writer = kernel::str::Formatter::new(page);
- writer.write_str("blocksize,size,rotational,irqmode\n")?;
+ writer.write_str("blocksize,size,rotational,irqmode,completion_nsec\n")?;
Ok(writer.bytes_written())
}
}
@@ -79,6 +83,7 @@ fn make_group(
rotational: 2,
size: 3,
irqmode: 4,
+ completion_nsec: 5,
],
};
@@ -94,6 +99,7 @@ fn make_group(
disk: None,
capacity_mib: 4096,
irq_mode: IRQMode::None,
+ completion_time: time::Delta::ZERO,
name: name.try_into()?,
}),
}),
@@ -106,6 +112,7 @@ fn make_group(
pub(crate) enum IRQMode {
None,
Soft,
+ Timer,
}
impl TryFrom<u8> for IRQMode {
@@ -115,6 +122,7 @@ fn try_from(value: u8) -> Result<Self> {
match value {
0 => Ok(Self::None),
1 => Ok(Self::Soft),
+ 2 => Ok(Self::Timer),
_ => Err(EINVAL),
}
}
@@ -125,11 +133,22 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Self::None => f.write_str("0")?,
Self::Soft => f.write_str("1")?,
+ Self::Timer => f.write_str("2")?,
}
Ok(())
}
}
+/// Wraps [`time::Delta`] to render the value as a bare nanosecond count for
+/// configfs attributes that historically used this format.
+struct DeltaDisplay(time::Delta);
+
+impl kernel::fmt::Display for DeltaDisplay {
+ fn fmt(&self, f: &mut kernel::fmt::Formatter<'_>) -> kernel::fmt::Result {
+ f.write_fmt(kernel::prelude::fmt!("{}", self.0.as_nanos()))
+ }
+}
+
#[pin_data]
pub(crate) struct DeviceConfig {
#[pin]
@@ -144,6 +163,7 @@ struct DeviceConfigInner {
rotational: bool,
capacity_mib: u64,
irq_mode: IRQMode,
+ completion_time: time::Delta,
disk: Option<GenDisk<NullBlkDevice>>,
}
@@ -174,6 +194,7 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
guard.rotational,
guard.capacity_mib,
guard.irq_mode,
+ guard.completion_time,
)?);
guard.powered = true;
} else if guard.powered && !power_op {
@@ -189,6 +210,13 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
configfs_simple_bool_field!(DeviceConfig, 2, rotational);
configfs_simple_field!(DeviceConfig, 3, capacity_mib, u64);
configfs_simple_field!(DeviceConfig, 4, irq_mode, IRQMode);
+configfs_attribute!(DeviceConfig, 5,
+ show: |this, page| show_field(DeltaDisplay(this.data.lock().completion_time), page),
+ store: |this, page| store_number_with_power_check(this, page, |data, value: i64| {
+ data.completion_time = time::Delta::from_nanos(value);
+ Ok(())
+ })
+);
impl core::str::FromStr for IRQMode {
type Err = Error;
diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index dd7a30519870..3e7a47e6d0e5 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -28,6 +28,15 @@
Arc,
Mutex, //
},
+ time::{
+ hrtimer::{
+ HrTimerCallback,
+ HrTimerCallbackContext,
+ HrTimerPointer,
+ HrTimerRestart, //
+ },
+ Delta,
+ },
types::{
OwnableRefCounted,
Owned, //
@@ -59,7 +68,11 @@
},
irqmode: u8 {
default: 0,
- description: "IRQ completion handler. 0-none, 1-softirq",
+ description: "IRQ completion handler. 0-none, 1-softirq, 2-timer",
+ },
+ completion_nsec: u64 {
+ default: 10_000,
+ description: "Time in ns to complete a request in hardware. Default: 10,000ns",
},
},
}
@@ -79,6 +92,7 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
let mut disks = KVec::new();
let defer_init = move || -> Result<_, Error> {
+ let completion_time: i64 = module_parameters::completion_nsec.value().try_into()?;
for i in 0..module_parameters::nr_devices.value() {
let name = CString::try_from_fmt(fmt!("rnullb{}", i))?;
@@ -88,6 +102,7 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
module_parameters::rotational.value(),
module_parameters::gb.value() * 1024,
module_parameters::irqmode.value().try_into()?,
+ Delta::from_nanos(completion_time),
)?;
disks.push(disk, GFP_KERNEL)?;
}
@@ -111,10 +126,17 @@ fn new(
rotational: bool,
capacity_mib: u64,
irq_mode: IRQMode,
+ completion_time: Delta,
) -> Result<GenDisk<Self>> {
let tagset = Arc::pin_init(TagSet::new(1, 256, 1), GFP_KERNEL)?;
- let queue_data = Box::new(QueueData { irq_mode }, GFP_KERNEL)?;
+ let queue_data = Box::new(
+ QueueData {
+ irq_mode,
+ completion_time,
+ },
+ GFP_KERNEL,
+ )?;
gen_disk::GenDiskBuilder::new()
.capacity_sectors(capacity_mib << (20 - block::SECTOR_SHIFT))
@@ -127,15 +149,43 @@ fn new(
struct QueueData {
irq_mode: IRQMode,
+ completion_time: Delta,
+}
+
+#[pin_data]
+struct Pdu {
+ #[pin]
+ timer: kernel::time::hrtimer::HrTimer<Self>,
+}
+
+impl HrTimerCallback for Pdu {
+ type Pointer<'a> = ARef<mq::Request<NullBlkDevice>>;
+
+ fn run(this: Self::Pointer<'_>, _context: HrTimerCallbackContext<'_, Self>) -> HrTimerRestart {
+ OwnableRefCounted::try_from_shared(this)
+ .map_err(|_e| kernel::error::code::EIO)
+ .expect("Failed to complete request")
+ .end_ok();
+ HrTimerRestart::NoRestart
+ }
+}
+
+kernel::impl_has_hr_timer! {
+ impl HasHrTimer<Self> for Pdu {
+ mode: kernel::time::hrtimer::RelativeMode<kernel::time::Monotonic>,
+ field: self.timer,
+ }
}
#[vtable]
impl Operations for NullBlkDevice {
type QueueData = KBox<QueueData>;
- type RequestData = ();
+ type RequestData = Pdu;
fn new_request_data() -> impl PinInit<Self::RequestData> {
- Ok(())
+ pin_init!(Pdu {
+ timer <- kernel::time::hrtimer::HrTimer::new(),
+ })
}
#[inline(always)]
@@ -143,6 +193,11 @@ fn queue_rq(queue_data: &QueueData, rq: Owned<mq::Request<Self>>, _is_last: bool
match queue_data.irq_mode {
IRQMode::None => rq.end_ok(),
IRQMode::Soft => mq::Request::complete(rq.into()),
+ IRQMode::Timer => {
+ OwnableRefCounted::into_shared(rq)
+ .start(queue_data.completion_time)
+ .dismiss();
+ }
}
Ok(())
}
--
2.51.2
^ permalink raw reply related
* [PATCH v2 54/83] block: rust: add `map_queues` support
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux, Andreas Hindborg
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
From: Andreas Hindborg <a.hindborg@samsung.com>
Add support for the `map_queues` callback to the Rust block layer
bindings. This callback allows drivers to customize the mapping between
CPUs and hardware queues.
The callback receives a mutable reference to the `TagSet`, and drivers
can use the `TagSet::update_maps` method to configure the mappings for
each queue type.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/kernel/block/mq/operations.rs | 29 +++++++++++++++++++++++++++--
rust/kernel/block/mq/tag_set.rs | 13 +++++++++++++
2 files changed, 40 insertions(+), 2 deletions(-)
diff --git a/rust/kernel/block/mq/operations.rs b/rust/kernel/block/mq/operations.rs
index 71d4192d627f..8a418bf0f3ba 100644
--- a/rust/kernel/block/mq/operations.rs
+++ b/rust/kernel/block/mq/operations.rs
@@ -12,7 +12,8 @@
gen_disk::GenDiskRef,
request::RequestDataWrapper,
IdleRequest,
- Request, //
+ Request,
+ TagSet, //
},
},
error::{
@@ -126,6 +127,11 @@ fn report_zones(
) -> Result<u32> {
Err(ENOTSUPP)
}
+
+ /// Called by the kernel to map submission queues to CPU cores.
+ fn map_queues(_tag_set: &TagSet<Self>) {
+ build_error!(crate::error::VTABLE_DEFAULT_ERROR)
+ }
}
/// A vtable for blk-mq to interact with a block device driver.
@@ -418,6 +424,21 @@ impl<T: Operations> OperationsVTable<T> {
})
}
+ /// This function is called by the C kernel. A pointer to this function is
+ /// installed in the `blk_mq_ops` vtable for the driver.
+ ///
+ /// # Safety
+ ///
+ /// This function may only be called by blk-mq C infrastructure. `tag_set`
+ /// must be a pointer to a valid and initialized `TagSet<T>`. The pointee
+ /// must be valid for use as a reference at least the duration of this call.
+ unsafe extern "C" fn map_queues_callback(tag_set: *mut bindings::blk_mq_tag_set) {
+ // SAFETY: The safety requirements of this function satiesfies the
+ // requirements of `TagSet::from_ptr`.
+ let tag_set = unsafe { TagSet::from_ptr(tag_set) };
+ T::map_queues(tag_set);
+ }
+
const VTABLE: bindings::blk_mq_ops = bindings::blk_mq_ops {
queue_rq: Some(Self::queue_rq_callback),
queue_rqs: None,
@@ -439,7 +460,11 @@ impl<T: Operations> OperationsVTable<T> {
exit_request: Some(Self::exit_request_callback),
cleanup_rq: None,
busy: None,
- map_queues: None,
+ map_queues: if T::HAS_MAP_QUEUES {
+ Some(Self::map_queues_callback)
+ } else {
+ None
+ },
#[cfg(CONFIG_BLK_DEBUG_FS)]
show_rq: None,
};
diff --git a/rust/kernel/block/mq/tag_set.rs b/rust/kernel/block/mq/tag_set.rs
index 157c47f64334..d3e93ad98b6e 100644
--- a/rust/kernel/block/mq/tag_set.rs
+++ b/rust/kernel/block/mq/tag_set.rs
@@ -116,6 +116,19 @@ pub fn flags(&self) -> Flags {
let flags_raw = unsafe { (*this).flags };
Flags::try_from(flags_raw).expect("Expected valid flags from C struct")
}
+
+ /// Create a `TagSet<T>` from a raw pointer.
+ ///
+ /// # Safety
+ ///
+ /// `ptr` must be a pointer to a valid and initialized `TagSet<T>`. There
+ /// may be no other mutable references to the tag set. The pointee must be
+ /// live and valid at least for the duration of the returned lifetime `'a`.
+ pub(crate) unsafe fn from_ptr<'a>(ptr: *mut bindings::blk_mq_tag_set) -> &'a Self {
+ // SAFETY: By the safety requirements of this function, `ptr` is valid
+ // for use as a reference for the duration of `'a`.
+ unsafe { &*(ptr.cast::<Self>()) }
+ }
}
#[pinned_drop]
--
2.51.2
^ permalink raw reply related
* [PATCH v2 06/83] block: rust: fix generation of bindings to `BLK_STS_.*`
From: Andreas Hindborg @ 2026-06-09 19:07 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Bindgen generates constants for CPP integer literals as u32. The
`blk_status_t` type is defined as `u8` but the variants of the type are
defined as integer literals via CPP macros. Thus the defined variants of
the type are not of the same type as the type itself.
Prevent bindgen from emitting generated bindings for the `BLK_STS_.*`
defines and instead define constants manually in `bindings_helper.h`
Also remove casts that are no longer necessary.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/bindgen_parameters | 6 ++++++
rust/bindings/bindings_helper.h | 19 +++++++++++++++++++
rust/kernel/block/mq/operations.rs | 17 +++++++++++++----
3 files changed, 38 insertions(+), 4 deletions(-)
diff --git a/rust/bindgen_parameters b/rust/bindgen_parameters
index 6f02d9720ad2..128731e84775 100644
--- a/rust/bindgen_parameters
+++ b/rust/bindgen_parameters
@@ -5,6 +5,12 @@
--blocklist-type __kernel_s?size_t
--blocklist-type __kernel_ptrdiff_t
+# Bindgen cannot extract values from the `((__force blk_status_t)N)`
+# CPP-macro form used by most of these and emits the few it can extract
+# as `u32`. Block them entirely; the `RUST_CONST_HELPER_BLK_STS_*`
+# definitions in `bindings_helper.h` expose them as `blk_status_t`.
+--blocklist-item BLK_STS_.*
+
--opaque-type xregs_state
--opaque-type desc_struct
--opaque-type arch_lbr_state
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 9da216faad51..b1fb3afee4ca 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -119,6 +119,25 @@ const gfp_t RUST_CONST_HELPER___GFP_ZERO = __GFP_ZERO;
const gfp_t RUST_CONST_HELPER___GFP_HIGHMEM = ___GFP_HIGHMEM;
const gfp_t RUST_CONST_HELPER___GFP_NOWARN = ___GFP_NOWARN;
const blk_features_t RUST_CONST_HELPER_BLK_FEAT_ROTATIONAL = BLK_FEAT_ROTATIONAL;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_OK = BLK_STS_OK;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_NOTSUPP = BLK_STS_NOTSUPP;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_TIMEOUT = BLK_STS_TIMEOUT;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_NOSPC = BLK_STS_NOSPC;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_TRANSPORT = BLK_STS_TRANSPORT;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_TARGET = BLK_STS_TARGET;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_RESV_CONFLICT = BLK_STS_RESV_CONFLICT;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_MEDIUM = BLK_STS_MEDIUM;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_PROTECTION = BLK_STS_PROTECTION;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_RESOURCE = BLK_STS_RESOURCE;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_IOERR = BLK_STS_IOERR;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_DM_REQUEUE = BLK_STS_DM_REQUEUE;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_AGAIN = BLK_STS_AGAIN;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_DEV_RESOURCE = BLK_STS_DEV_RESOURCE;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_ZONE_OPEN_RESOURCE = BLK_STS_ZONE_OPEN_RESOURCE;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_ZONE_ACTIVE_RESOURCE = BLK_STS_ZONE_ACTIVE_RESOURCE;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_OFFLINE = BLK_STS_OFFLINE;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_DURATION_LIMIT = BLK_STS_DURATION_LIMIT;
+const blk_status_t RUST_CONST_HELPER_BLK_STS_INVAL = BLK_STS_INVAL;
const fop_flags_t RUST_CONST_HELPER_FOP_UNSIGNED_OFFSET = FOP_UNSIGNED_OFFSET;
const xa_mark_t RUST_CONST_HELPER_XA_PRESENT = XA_PRESENT;
diff --git a/rust/kernel/block/mq/operations.rs b/rust/kernel/block/mq/operations.rs
index 89029f468f44..6b2fcd76372e 100644
--- a/rust/kernel/block/mq/operations.rs
+++ b/rust/kernel/block/mq/operations.rs
@@ -6,10 +6,19 @@
use crate::{
bindings,
- block::mq::{request::RequestDataWrapper, Request},
- error::{from_result, Result},
+ block::mq::{
+ request::RequestDataWrapper,
+ Request, //
+ },
+ error::{
+ from_result,
+ Result, //
+ },
prelude::*,
- sync::{aref::ARef, Refcount},
+ sync::{
+ aref::ARef,
+ Refcount, //
+ },
types::ForeignOwnable,
};
use core::marker::PhantomData;
@@ -124,7 +133,7 @@ impl<T: Operations> OperationsVTable<T> {
if let Err(e) = ret {
e.to_blk_status()
} else {
- bindings::BLK_STS_OK as bindings::blk_status_t
+ bindings::BLK_STS_OK
}
}
--
2.51.2
^ permalink raw reply related
* [PATCH v2 44/83] block: rnull: add bandwidth limiting
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add bandwidth limiting support to rnull via the `mbps` configfs
attribute. When set to a non-zero value, the driver limits I/O
throughput to the specified rate in megabytes per second.
The implementation uses a token bucket algorithm to enforce the rate
limit, delaying request completion when the limit is exceeded.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
drivers/block/rnull/configfs.rs | 7 ++-
drivers/block/rnull/rnull.rs | 111 +++++++++++++++++++++++++++++++++++-----
2 files changed, 105 insertions(+), 13 deletions(-)
diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index 4df0b748596a..59217d75f46b 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -104,6 +104,7 @@ fn make_group(
badblocks_once: 13,
badblocks_partial_io: 14,
cache_size_mib: 15,
+ mbps: 16,
],
};
@@ -135,6 +136,7 @@ fn make_group(
GFP_KERNEL
)?,
cache_size_mib: 0,
+ mbps: 0,
}),
}),
core::iter::empty(),
@@ -209,6 +211,7 @@ struct DeviceConfigInner {
bad_blocks_partial_io: bool,
cache_size_mib: u64,
disk_storage: Arc<DiskStorage>,
+ mbps: u32,
}
#[vtable]
@@ -248,6 +251,7 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
bad_blocks_once: guard.bad_blocks_once,
bad_blocks_partial_io: guard.bad_blocks_partial_io,
storage: guard.disk_storage.clone(),
+ bandwidth_limit: u64::from(guard.mbps) * 2u64.pow(20),
})?);
guard.powered = true;
} else if guard.powered && !power_op {
@@ -259,7 +263,6 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
}
}
-// DiskStorage::new(cache_size_mib << 20, block_size as usize),
configfs_simple_field!(DeviceConfig, 1, block_size, u32, check GenDiskBuilder::validate_block_size);
configfs_simple_bool_field!(DeviceConfig, 2, rotational);
configfs_simple_field!(DeviceConfig, 3, capacity_mib, u64);
@@ -417,3 +420,5 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
Ok(())
})
);
+
+configfs_simple_field!(DeviceConfig, 16, mbps, u32);
diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index 6ceba23a4d3e..1dda8d717b95 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -25,7 +25,8 @@
self,
gen_disk::{
self,
- GenDisk, //
+ GenDisk,
+ GenDiskRef, //
},
Operations,
TagSet, //
@@ -37,25 +38,32 @@
Result, //
},
ffi,
+ impl_has_hr_timer,
memalloc_scope,
new_mutex,
new_spinlock,
pr_info,
prelude::*,
+ revocable::Revocable,
str::CString,
sync::{
aref::ARef,
atomic::{
ordering,
Atomic, //
- }, //
+ },
Arc,
+ ArcBorrow,
Mutex,
+ SetOnce,
SpinLock,
- SpinLockGuard,
+ SpinLockGuard, //
},
time::{
hrtimer::{
+ self,
+ ArcHrTimerHandle,
+ HrTimer,
HrTimerCallback,
HrTimerCallbackContext,
HrTimerPointer,
@@ -127,6 +135,10 @@
default: false,
description: "No IO scheduler",
},
+ mbps: u32 {
+ default: 0,
+ description: "Max bandwidth in MiB/s. 0 means no limit.",
+ },
},
}
@@ -172,6 +184,7 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
bad_blocks_once: false,
bad_blocks_partial_io: false,
storage: Arc::pin_init(DiskStorage::new(0, block_size as usize), GFP_KERNEL)?,
+ bandwidth_limit: u64::from(module_parameters::mbps.value()) * 2u64.pow(20),
})?;
disks.push(disk, GFP_KERNEL)?;
}
@@ -202,6 +215,7 @@ struct NullBlkOptions<'a> {
bad_blocks_once: bool,
bad_blocks_partial_io: bool,
storage: Arc<DiskStorage>,
+ bandwidth_limit: u64,
}
#[pin_data]
@@ -214,9 +228,18 @@ struct NullBlkDevice {
bad_blocks: Arc<BadBlocks>,
bad_blocks_once: bool,
bad_blocks_partial_io: bool,
+ bandwidth_limit: u64,
+ #[pin]
+ bandwidth_timer: HrTimer<Self>,
+ bandwidth_bytes: Atomic<u64>,
+ #[pin]
+ bandwidth_timer_handle: SpinLock<Option<ArcHrTimerHandle<Self>>>,
+ disk: SetOnce<Arc<Revocable<GenDiskRef<Self>>>>,
}
impl NullBlkDevice {
+ const BANDWIDTH_TIMER_INTERVAL: Delta = Delta::from_millis(20);
+
fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
let NullBlkOptions {
name,
@@ -234,6 +257,7 @@ fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
bad_blocks_once,
bad_blocks_partial_io,
storage,
+ bandwidth_limit,
} = options;
let mut flags = mq::tag_set::Flags::default();
@@ -268,7 +292,7 @@ fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
GFP_KERNEL,
)?;
- let queue_data = Box::try_pin_init(
+ let queue_data = Arc::try_pin_init(
try_pin_init!(Self {
storage,
irq_mode,
@@ -278,6 +302,11 @@ fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
bad_blocks,
bad_blocks_once,
bad_blocks_partial_io,
+ bandwidth_limit: bandwidth_limit / 50,
+ bandwidth_timer <- HrTimer::new(),
+ bandwidth_bytes: Atomic::new(0),
+ bandwidth_timer_handle <- new_spinlock!(None),
+ disk: SetOnce::new(),
}),
GFP_KERNEL,
)?;
@@ -294,7 +323,10 @@ fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
.max_hw_discard_sectors(ffi::c_uint::MAX >> block::SECTOR_SHIFT);
}
- builder.build(fmt!("{}", name.to_str()?), tagset, queue_data)
+ let disk = builder.build(fmt!("{}", name.to_str()?), tagset, queue_data)?;
+ let queue_data: ArcBorrow<'_, Self> = disk.queue_data();
+ queue_data.disk.populate(disk.get_ref());
+ Ok(disk)
}
fn sheaf_size() -> usize {
@@ -522,6 +554,36 @@ fn end_request(rq: Owned<mq::Request<Self>>) {
}
}
+impl_has_hr_timer! {
+ impl HasHrTimer<Self> for NullBlkDevice {
+ mode: hrtimer::RelativeHardMode<kernel::time::Monotonic>,
+ field: self.bandwidth_timer,
+ }
+}
+
+impl HrTimerCallback for NullBlkDevice {
+ type Pointer<'a> = Arc<Self>;
+
+ fn run(
+ this: ArcBorrow<'_, Self>,
+ mut context: HrTimerCallbackContext<'_, Self>,
+ ) -> HrTimerRestart {
+ if this.bandwidth_bytes.load(ordering::Relaxed) == 0 {
+ return HrTimerRestart::NoRestart;
+ }
+
+ this.disk.as_ref().map(|disk| {
+ disk.try_access()
+ .map(|disk| disk.queue().start_stopped_hw_queues_async())
+ });
+
+ this.bandwidth_bytes.store(0, ordering::Relaxed);
+
+ context.forward_now(Self::BANDWIDTH_TIMER_INTERVAL);
+ HrTimerRestart::Restart
+ }
+}
+
struct HwQueueContext {
page: Option<KBox<disk_storage::NullBlockPage>>,
}
@@ -529,7 +591,7 @@ struct HwQueueContext {
#[pin_data]
struct Pdu {
#[pin]
- timer: kernel::time::hrtimer::HrTimer<Self>,
+ timer: HrTimer<Self>,
error: Atomic<u32>,
}
@@ -578,14 +640,14 @@ fn align_down<T>(value: T, to: T) -> T
#[vtable]
impl Operations for NullBlkDevice {
- type QueueData = Pin<KBox<Self>>;
+ type QueueData = Arc<Self>;
type RequestData = Pdu;
type TagSetData = ();
type HwData = Pin<KBox<SpinLock<HwQueueContext>>>;
fn new_request_data() -> impl PinInit<Self::RequestData> {
pin_init!(Pdu {
- timer <- kernel::time::hrtimer::HrTimer::new(),
+ timer <- HrTimer::new(),
error: Atomic::new(0),
})
}
@@ -593,14 +655,39 @@ fn new_request_data() -> impl PinInit<Self::RequestData> {
#[inline(always)]
fn queue_rq(
hw_data: Pin<&SpinLock<HwQueueContext>>,
- this: Pin<&Self>,
+ this: ArcBorrow<'_, Self>,
rq: Owned<mq::IdleRequest<Self>>,
_is_last: bool,
) -> BlkResult {
- let mut rq = rq.start();
let mut sectors = rq.sectors();
- Self::handle_bad_blocks(this.get_ref(), &mut rq, &mut sectors)?;
+ if this.bandwidth_limit != 0 {
+ if !this.bandwidth_timer.active() {
+ drop(this.bandwidth_timer_handle.lock().take());
+ let arc: Arc<_> = this.into();
+ *this.bandwidth_timer_handle.lock() =
+ Some(arc.start(Self::BANDWIDTH_TIMER_INTERVAL));
+ }
+
+ if this
+ .bandwidth_bytes
+ .fetch_add(u64::from(rq.bytes()), ordering::Relaxed)
+ + u64::from(rq.bytes())
+ > this.bandwidth_limit
+ {
+ rq.queue().stop_hw_queues();
+ if this.bandwidth_bytes.load(ordering::Relaxed) <= this.bandwidth_limit {
+ rq.queue().start_stopped_hw_queues_async();
+ }
+
+ return Err(kernel::block::error::code::BLK_STS_DEV_RESOURCE);
+ }
+ }
+
+ let mut rq = rq.start();
+
+ use core::ops::Deref;
+ Self::handle_bad_blocks(this.deref(), &mut rq, &mut sectors)?;
if this.memory_backed {
memalloc_scope!(let _noio: NoIo);
@@ -623,7 +710,7 @@ fn queue_rq(
Ok(())
}
- fn commit_rqs(_hw_data: Pin<&SpinLock<HwQueueContext>>, _queue_data: Pin<&Self>) {}
+ fn commit_rqs(_hw_data: Pin<&SpinLock<HwQueueContext>>, _queue_data: ArcBorrow<'_, Self>) {}
fn init_hctx(_tagset_data: (), _hctx_idx: u32) -> Result<Self::HwData> {
KBox::pin_init(new_spinlock!(HwQueueContext { page: None }), GFP_KERNEL)
--
2.51.2
^ permalink raw reply related
* [PATCH v2 21/83] block: rust: add Request::sectors() method
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add a new method to get the size of a request in number of sectors.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/kernel/block/mq/request.rs | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request.rs
index 19bdf17de166..54fe580b7b42 100644
--- a/rust/kernel/block/mq/request.rs
+++ b/rust/kernel/block/mq/request.rs
@@ -183,6 +183,13 @@ pub fn sector(&self) -> usize {
unsafe { (*self.0.get()).__sector as usize }
}
+ /// Get the size of the request in number of sectors.
+ #[inline(always)]
+ pub fn sectors(&self) -> usize {
+ // SAFETY: By type invariant of `Self`, `self.0` is valid and live.
+ (unsafe { (*self.0.get()).__data_len as usize }) >> crate::block::SECTOR_SHIFT
+ }
+
/// Return a pointer to the [`RequestDataWrapper`] stored in the private area
/// of the request structure.
///
--
2.51.2
^ permalink raw reply related
* [PATCH v2 77/83] block: rnull: add fault injection support
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add fault injection support to rnull using the kernel fault injection
infrastructure. When enabled via `CONFIG_FAULT_INJECTION`, users can
inject failures into I/O requests through the standard fault injection
debugfs interface.
The fault injection point is exposed as a configfs default group,
allowing per-device fault injection configuration.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
drivers/block/rnull/Kconfig | 11 ++++
drivers/block/rnull/configfs.rs | 57 ++++++++++++++++++-
drivers/block/rnull/rnull.rs | 121 +++++++++++++++++++++++++++++++++++++---
3 files changed, 180 insertions(+), 9 deletions(-)
diff --git a/drivers/block/rnull/Kconfig b/drivers/block/rnull/Kconfig
index 7bc5b376c128..1ade5d8c1799 100644
--- a/drivers/block/rnull/Kconfig
+++ b/drivers/block/rnull/Kconfig
@@ -11,3 +11,14 @@ config BLK_DEV_RUST_NULL
devices that can be configured via various configuration options.
If unsure, say N.
+
+config BLK_DEV_RUST_NULL_FAULT_INJECTION
+ bool "Support fault injection for Rust Null test block driver"
+ depends on BLK_DEV_RUST_NULL && FAULT_INJECTION_CONFIGFS
+ help
+ Enable fault injection support for the Rust null block driver. This
+ allows injecting errors into block I/O operations for testing error
+ handling paths and verifying system resilience. Fault injection is
+ configured through configfs alongside the null block device settings.
+
+ If unsure, say N.
diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index d9246b9150f4..eaa7617e5ffa 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -48,6 +48,9 @@
mod macros;
+#[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+use kernel::fault_injection::FaultConfig;
+
pub(crate) fn subsystem(
shared_tag_set: Arc<TagSet<NullBlkDevice>>,
) -> impl PinInit<kernel::configfs::Subsystem<Config>, Error> {
@@ -132,10 +135,44 @@ fn make_group(
],
};
+ use kernel::configfs::CDefaultGroup;
+
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ let mut default_groups: KVec<Arc<dyn CDefaultGroup>> = KVec::new();
+
+ #[cfg(not(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION))]
+ let default_groups: KVec<Arc<dyn CDefaultGroup>> = KVec::new();
+
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ let timeout_inject = Arc::pin_init(
+ kernel::fault_injection::FaultConfig::new(c"timeout_inject"),
+ GFP_KERNEL,
+ )?;
+
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ let requeue_inject = Arc::pin_init(
+ kernel::fault_injection::FaultConfig::new(c"requeue_inject"),
+ GFP_KERNEL,
+ )?;
+
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ let init_hctx_inject = Arc::pin_init(
+ kernel::fault_injection::FaultConfig::new(c"init_hctx_fault_inject"),
+ GFP_KERNEL,
+ )?;
+
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ {
+ default_groups.push(timeout_inject.clone(), GFP_KERNEL)?;
+ default_groups.push(requeue_inject.clone(), GFP_KERNEL)?;
+ default_groups.push(init_hctx_inject.clone(), GFP_KERNEL)?;
+ }
+
let block_size = 4096;
Ok(configfs::Group::new(
name.try_into()?,
item_type,
+ // default_groups,
// TODO: cannot coerce new_mutex!() to impl PinInit<_, Error>, so put mutex inside
try_pin_init!(DeviceConfig {
data <- new_mutex!(DeviceConfigInner {
@@ -176,9 +213,15 @@ fn make_group(
zone_max_active: 0,
zone_append_max_sectors: u32::MAX,
fua: true,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ timeout_inject,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ requeue_inject,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ init_hctx_inject,
}),
}),
- core::iter::empty(),
+ default_groups,
))
}
}
@@ -263,6 +306,12 @@ struct DeviceConfigInner {
zone_max_active: u32,
zone_append_max_sectors: u32,
fua: bool,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ timeout_inject: Arc<FaultConfig>,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ requeue_inject: Arc<FaultConfig>,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ init_hctx_inject: Arc<FaultConfig>,
}
#[vtable]
@@ -320,6 +369,8 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
memory_backed: guard.memory_backed,
no_sched: guard.no_sched,
hw_queue_depth: guard.hw_queue_depth,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ init_hctx_inject: guard.init_hctx_inject.clone(),
},
zoned: guard.zoned,
zone_size_mib: guard.zone_size_mib,
@@ -329,6 +380,10 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
zone_max_active: guard.zone_max_active,
zone_append_max_sectors: guard.zone_append_max_sectors,
forced_unit_access: guard.fua,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ requeue_inject: guard.requeue_inject.clone(),
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ timeout_inject: guard.timeout_inject.clone(),
})?);
guard.powered = true;
} else if guard.powered && !power_op {
diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index 8e17b2b17a66..f909360ec70d 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -40,6 +40,7 @@
IoCompletionBatch,
Operations,
RequestList,
+ RequestTimeoutStatus,
TagSet, //
},
SECTOR_SHIFT,
@@ -90,6 +91,9 @@
};
use util::*;
+#[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+use kernel::fault_injection::FaultConfig;
+
module! {
type: NullBlkModule,
name: "rnull_mod",
@@ -203,6 +207,8 @@
},
}
+// TODO: Fault inject via params - requires module_params string support.
+
#[pin_data]
struct NullBlkModule {
#[pin]
@@ -241,6 +247,11 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
memory_backed,
no_sched,
hw_queue_depth,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ init_hctx_inject: Arc::pin_init(
+ FaultConfig::new(c"init_hctx_fault_inject"),
+ GFP_KERNEL,
+ )?,
})?;
let mut disks = KVec::new();
@@ -278,6 +289,11 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
memory_backed,
no_sched,
hw_queue_depth,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ init_hctx_inject: Arc::pin_init(
+ FaultConfig::new(c"init_hctx_fault_inject"),
+ GFP_KERNEL,
+ )?,
},
zoned: module_parameters::zoned.value(),
zone_size_mib: module_parameters::zone_size.value(),
@@ -287,6 +303,10 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
zone_max_active: module_parameters::zone_max_active.value(),
zone_append_max_sectors: module_parameters::zone_append_max_sectors.value(),
forced_unit_access: module_parameters::fua.value(),
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ requeue_inject: Arc::pin_init(FaultConfig::new(c"requeue_inject"), GFP_KERNEL)?,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ timeout_inject: Arc::pin_init(FaultConfig::new(c"timeout_inject"), GFP_KERNEL)?,
})?;
disks.push(disk, GFP_KERNEL)?;
}
@@ -328,6 +348,10 @@ struct NullBlkOptions<'a> {
#[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(dead_code))]
zone_append_max_sectors: u32,
forced_unit_access: bool,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ requeue_inject: Arc<FaultConfig>,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ timeout_inject: Arc<FaultConfig>,
}
#[pin_data]
@@ -350,6 +374,12 @@ struct NullBlkDevice {
#[cfg(CONFIG_BLK_DEV_ZONED)]
#[pin]
zoned: zoned::ZoneOptions,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ requeue_inject: Arc<FaultConfig>,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ requeue_selector: kernel::sync::atomic::Atomic<u64>,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ timeout_inject: Arc<FaultConfig>,
}
struct TagSetOptions {
@@ -359,6 +389,8 @@ struct TagSetOptions {
memory_backed: bool,
no_sched: bool,
hw_queue_depth: u32,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ init_hctx_inject: Arc<FaultConfig>,
}
impl NullBlkDevice {
@@ -372,6 +404,8 @@ fn build_tag_set(options: TagSetOptions) -> Result<Arc<TagSet<Self>>> {
memory_backed,
no_sched,
hw_queue_depth,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ init_hctx_inject,
} = options;
if home_node > kernel::numa::num_online_nodes().try_into()? {
@@ -404,6 +438,8 @@ fn build_tag_set(options: TagSetOptions) -> Result<Arc<TagSet<Self>>> {
NullBlkTagsetData {
queue_depth: hw_queue_depth,
queue_config,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ init_hctx_inject,
},
GFP_KERNEL,
)?,
@@ -446,6 +482,11 @@ fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
#[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(unused_variables))]
zone_append_max_sectors,
forced_unit_access,
+
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ requeue_inject,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ timeout_inject,
} = options;
let memory_backed = tag_set.memory_backed;
@@ -491,6 +532,12 @@ fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
zone_max_active,
zone_append_max_sectors,
})?,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ requeue_inject,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ requeue_selector: Atomic::new(0),
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ timeout_inject,
}),
GFP_KERNEL,
)?;
@@ -733,7 +780,9 @@ fn handle_bad_blocks(&self, rq: &mut Owned<mq::Request<Self>>, sectors: &mut u32
badblocks::BlockStatus::None => {}
badblocks::BlockStatus::Acknowledged(mut range)
| badblocks::BlockStatus::Unacknowledged(mut range) => {
- rq.data_ref().error.store(1, ordering::Relaxed);
+ rq.data_ref()
+ .error
+ .store(block::error::code::BLK_STS_IOERR.into(), ordering::Relaxed);
if self.bad_blocks_once {
self.bad_blocks.set_good(range.clone())?;
@@ -783,6 +832,22 @@ fn queue_rq_internal(
rq: Owned<mq::IdleRequest<Self>>,
_is_last: bool,
) -> Result<(), QueueRequestError> {
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ if rq.queue_data().requeue_inject.should_fail(1) {
+ if rq
+ .queue_data()
+ .requeue_selector
+ .fetch_add(1, ordering::Relaxed)
+ & 1
+ == 0
+ {
+ return Err(QueueRequestError { request: rq });
+ } else {
+ rq.requeue(true);
+ return Ok(());
+ }
+ }
+
if this.bandwidth_limit != 0 {
if !this.bandwidth_timer.active() {
drop(this.bandwidth_timer_handle.lock().take());
@@ -808,6 +873,12 @@ fn queue_rq_internal(
let mut rq = rq.start();
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ if rq.queue_data().timeout_inject.should_fail(1) {
+ rq.data_ref().fake_timeout.store(1, ordering::Relaxed);
+ return Ok(());
+ }
+
if rq.command() == mq::Command::Flush {
if this.memory_backed {
this.storage.flush(&hw_data);
@@ -831,12 +902,13 @@ fn queue_rq_internal(
Ok(())
})();
- if let Err(e) = status {
- // Do not overwrite existing error. We do not care whether this write fails.
- let _ = rq
- .data_ref()
- .error
- .cmpxchg(0, e.to_errno(), ordering::Relaxed);
+ if status.is_err() {
+ // Do not overwrite existing error.
+ let _ = rq.data_ref().error.cmpxchg(
+ 0,
+ kernel::block::error::code::BLK_STS_IOERR.into(),
+ ordering::Relaxed,
+ );
}
if rq.is_poll() {
@@ -914,7 +986,8 @@ struct HwQueueContext {
struct Pdu {
#[pin]
timer: HrTimer<Self>,
- error: Atomic<i32>,
+ error: Atomic<u32>,
+ fake_timeout: Atomic<u32>,
}
impl HrTimerCallback for Pdu {
@@ -939,6 +1012,8 @@ impl HasHrTimer<Self> for Pdu {
struct NullBlkTagsetData {
queue_depth: u32,
queue_config: Arc<Mutex<QueueConfig>>,
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ init_hctx_inject: Arc<FaultConfig>,
}
#[vtable]
@@ -952,6 +1027,7 @@ fn new_request_data() -> impl PinInit<Self::RequestData> {
pin_init!(Pdu {
timer <- HrTimer::new(),
error: Atomic::new(0),
+ fake_timeout: Atomic::new(0),
})
}
@@ -1006,6 +1082,11 @@ fn poll(
}
fn init_hctx(tagset_data: &NullBlkTagsetData, _hctx_idx: u32) -> Result<Self::HwData> {
+ #[cfg(CONFIG_BLK_DEV_RUST_NULL_FAULT_INJECTION)]
+ if tagset_data.init_hctx_inject.should_fail(1) {
+ return Err(EFAULT);
+ }
+
KBox::pin_init(
new_spinlock!(HwQueueContext {
page: None,
@@ -1067,4 +1148,28 @@ fn map_queues(tag_set: Pin<&mut TagSet<Self>>) {
})
.unwrap()
}
+
+ fn request_timeout(tag_set: &TagSet<Self>, qid: u32, tag: u32) -> RequestTimeoutStatus {
+ if let Some(request) = tag_set.tag_to_rq(qid, tag) {
+ pr_info!("Request timed out\n");
+ // Only fail requests that are faking timeouts. Requests that time
+ // out due to memory pressure will be completed normally.
+ if request.data_ref().fake_timeout.load(ordering::Relaxed) != 0 {
+ request.data_ref().error.store(
+ block::error::code::BLK_STS_TIMEOUT.into(),
+ ordering::Relaxed,
+ );
+ request.data_ref().fake_timeout.store(0, ordering::Relaxed);
+
+ if let Ok(request) = OwnableRefCounted::try_from_shared(request) {
+ Self::end_request(request);
+ return RequestTimeoutStatus::Completed;
+ }
+ kernel::pr_warn_once!("Timed out request could not be completed\n");
+ }
+ } else {
+ kernel::pr_warn_once!("Timed out request referenced in timeout handler\n");
+ }
+ RequestTimeoutStatus::RetryLater
+ }
}
--
2.51.2
^ permalink raw reply related
* [PATCH v2 73/83] block: rust: add `TagSet::tag_to_rq`
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add a way for block device drivers to obtain a `Request` from a tag. This
is backed by the C `blk_mq_tag_to_rq` but with added checks to ensure
memory safety.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/helpers/blk.c | 6 ++++
rust/kernel/block/mq/tag_set.rs | 66 ++++++++++++++++++++++++++++++++++++++++-
2 files changed, 71 insertions(+), 1 deletion(-)
diff --git a/rust/helpers/blk.c b/rust/helpers/blk.c
index 422289d617ae..1f3e5c661096 100644
--- a/rust/helpers/blk.c
+++ b/rust/helpers/blk.c
@@ -53,3 +53,9 @@ __rust_helper struct request *rust_helper_rq_list_peek(struct rq_list *rl)
{
return rq_list_peek(rl);
}
+
+__rust_helper struct request *
+rust_helper_blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
+{
+ return blk_mq_tag_to_rq(tags, tag);
+}
diff --git a/rust/kernel/block/mq/tag_set.rs b/rust/kernel/block/mq/tag_set.rs
index e89c76987b54..66b6a30a9e66 100644
--- a/rust/kernel/block/mq/tag_set.rs
+++ b/rust/kernel/block/mq/tag_set.rs
@@ -6,7 +6,6 @@
use crate::{
alloc::NumaNode,
- bindings,
block::mq::{
operations::OperationsVTable,
request::RequestDataWrapper,
@@ -17,7 +16,9 @@
Result, //
},
prelude::*,
+ sync::atomic::ordering,
types::{
+ ARef,
ForeignOwnable,
Opaque, //
},
@@ -39,6 +40,8 @@
Flags, //
};
+use super::Request;
+
/// A wrapper for the C `struct blk_mq_tag_set`.
///
/// `struct blk_mq_tag_set` contains a `struct list_head` and so must be pinned.
@@ -193,6 +196,67 @@ pub fn data(&self) -> <T::TagSetData as ForeignOwnable>::Borrowed<'_> {
// converted back with `from_foreign` while `&self` is live.
unsafe { T::TagSetData::borrow(ptr) }
}
+
+ /// Obtain a shared reference to a request.
+ ///
+ /// This method will hang if the request is not owned by the driver, or if
+ /// the driver holds an [`Ownable<Request>`] reference to the request.
+ pub fn tag_to_rq(&self, qid: u32, tag: u32) -> Option<ARef<Request<T>>> {
+ if qid >= self.hw_queue_count() {
+ kernel::pr_warn_once!("Invalid queue id: {qid}\n");
+ return None;
+ }
+
+ // SAFETY: We checked that `qid` is within bounds.
+ let tags = unsafe { *(*self.inner.get()).tags.add(qid as usize) };
+
+ // SAFETY: We checked `qid` for overflow above, so `tags` is valid.
+ let rq_ptr = unsafe { bindings::blk_mq_tag_to_rq(tags, tag) };
+ if rq_ptr.is_null() {
+ None
+ } else {
+ // SAFETY: if `rq_ptr`is not null, it is a valid request pointer.
+ let refcount_ptr = unsafe {
+ RequestDataWrapper::refcount_ptr(
+ Request::wrapper_ptr(rq_ptr.cast::<Request<T>>()).as_ptr(),
+ )
+ };
+
+ // SAFETY: The refcount was initialized in `init_request_callback` and is never
+ // referenced mutably.
+ let refcount_ref = unsafe { &*refcount_ptr };
+
+ let atomic_ref = refcount_ref.as_atomic();
+
+ // It is possible for an interrupt to arrive faster than the last
+ // change to the refcount, so retry if the refcount is not what we
+ // think it should be.
+ loop {
+ // Load acquire to sync with store release of `Owned<Request>`
+ // being destroyed (prevent mutable access overlapping shared
+ // access).
+ let prev = atomic_ref.load(ordering::Acquire);
+
+ if prev >= 1 {
+ // Store relaxed as no other operations need to happen strictly
+ // before or after the increment.
+ match atomic_ref.cmpxchg(prev, prev + 1, ordering::Relaxed) {
+ Ok(_) => break,
+ // NOTE: We cannot use the load part of a failed cmpxchg as it is always
+ // relaxed.
+ Err(_) => continue,
+ }
+ } else {
+ // We are probably waiting to observe a refcount increment.
+ core::hint::spin_loop();
+ continue;
+ };
+ }
+
+ // SAFETY: We checked above that `rq_ptr` is valid for use as an `ARef`.
+ Some(unsafe { Request::aref_from_raw(rq_ptr) })
+ }
+ }
}
#[pinned_drop]
--
2.51.2
^ permalink raw reply related
* [PATCH v2 38/83] block: rust: introduce an idle type state for `Request`
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Block device drivers need to invoke `blk_mq_start_request` on a request to
indicate that they have started processing the request. This function may
only be called once after a request has been issued to a driver. For Rust
block device drivers, the Rust abstractions handle this call. However, in
some situations a driver may want to control when a request is started.
Thus, expose the start method to Rust block device drivers.
To ensure the method is not called more than once, introduce a type state
for `Request`. Requests are issued as `IdleRequest` and transition to
`Request` when the `start` method is called.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
drivers/block/rnull/rnull.rs | 3 +-
rust/kernel/block/mq.rs | 5 +-
rust/kernel/block/mq/operations.rs | 15 ++--
rust/kernel/block/mq/request.rs | 149 +++++++++++++++++++++++++++++++------
4 files changed, 137 insertions(+), 35 deletions(-)
diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index fd9b770965a6..bb8c4df08218 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -593,9 +593,10 @@ fn new_request_data() -> impl PinInit<Self::RequestData> {
fn queue_rq(
hw_data: Pin<&SpinLock<HwQueueContext>>,
this: Pin<&Self>,
- mut rq: Owned<mq::Request<Self>>,
+ rq: Owned<mq::IdleRequest<Self>>,
_is_last: bool,
) -> Result {
+ let mut rq = rq.start();
let mut sectors = rq.sectors();
Self::handle_bad_blocks(this.get_ref(), &mut rq, &mut sectors)?;
diff --git a/rust/kernel/block/mq.rs b/rust/kernel/block/mq.rs
index b095cc7f51ce..77e3593e8626 100644
--- a/rust/kernel/block/mq.rs
+++ b/rust/kernel/block/mq.rs
@@ -88,10 +88,10 @@
//! fn queue_rq(
//! _hw_data: (),
//! _queue_data: (),
-//! rq: Owned<Request<Self>>,
+//! rq: Owned<IdleRequest<Self>>,
//! _is_last: bool
//! ) -> Result {
-//! rq.end_ok();
+//! rq.start().end_ok();
//! Ok(())
//! }
//!
@@ -131,6 +131,7 @@
pub use operations::Operations;
pub use request::{
+ IdleRequest,
Request,
RequestTimerHandle, //
};
diff --git a/rust/kernel/block/mq/operations.rs b/rust/kernel/block/mq/operations.rs
index 1b20df25d6df..01917ef213d1 100644
--- a/rust/kernel/block/mq/operations.rs
+++ b/rust/kernel/block/mq/operations.rs
@@ -8,6 +8,7 @@
bindings,
block::mq::{
request::RequestDataWrapper,
+ IdleRequest,
Request, //
},
error::{
@@ -25,10 +26,7 @@
Owned, //
},
};
-use core::{
- marker::PhantomData,
- ptr::NonNull, //
-};
+use core::marker::PhantomData;
use pin_init::PinInit;
type ForeignBorrowed<'a, T> = <T as ForeignOwnable>::Borrowed<'a>;
@@ -82,7 +80,7 @@ pub trait Operations: Sized {
fn queue_rq(
hw_data: ForeignBorrowed<'_, Self::HwData>,
queue_data: ForeignBorrowed<'_, Self::QueueData>,
- rq: Owned<Request<Self>>,
+ rq: Owned<IdleRequest<Self>>,
is_last: bool,
) -> Result;
@@ -154,14 +152,14 @@ impl<T: Operations> OperationsVTable<T> {
== 0
);
+ // INVARIANT: By C API contract, `bd.rq` has not been started yet.
// SAFETY:
// - By API contract, we own the request.
// - By the safety requirements of this function, `request` is a valid
// `struct request` and the private data is properly initialized.
// - `rq` will be alive until `blk_mq_end_request` is called and is
// reference counted by until then.
- let mut rq =
- unsafe { Owned::from_raw(NonNull::<Request<T>>::new_unchecked((*bd).rq.cast())) };
+ let rq = unsafe { IdleRequest::from_raw((*bd).rq) };
// SAFETY: The safety requirement for this function ensure that `hctx`
// is valid and that `driver_data` was produced by a call to
@@ -177,9 +175,6 @@ impl<T: Operations> OperationsVTable<T> {
// dropped, which happens after we are dropped.
let queue_data = unsafe { T::QueueData::borrow(queue_data) };
- // SAFETY: We have exclusive access and we just set the refcount above.
- unsafe { rq.start_unchecked() };
-
let ret = T::queue_rq(
hw_data,
queue_data,
diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request.rs
index c06907dfe5b5..f94e9c2181d0 100644
--- a/rust/kernel/block/mq/request.rs
+++ b/rust/kernel/block/mq/request.rs
@@ -24,6 +24,7 @@
HrTimerPointer, //
},
types::{
+ ForeignOwnable,
Opaque,
Ownable,
OwnableRefCounted,
@@ -33,6 +34,7 @@
use core::{
ffi::c_void,
marker::PhantomData,
+ ops::Deref,
pin::Pin,
ptr::NonNull, //
};
@@ -42,6 +44,104 @@
BioIterator, //
};
+/// A [`Request`] that a driver has not yet begun to process.
+///
+/// A driver can convert an `IdleRequest` to a [`Request`] by calling [`IdleRequest::start`].
+///
+/// # Invariants
+///
+/// - This request has not been started yet.
+#[repr(transparent)]
+pub struct IdleRequest<T>(RequestInner<T>);
+
+impl<T: Operations> IdleRequest<T> {
+ /// Mark the request as processing.
+ ///
+ /// This converts the [`IdleRequest`] into a [`Request`].
+ pub fn start(self: Owned<Self>) -> Owned<Request<T>> {
+ // SAFETY: By type invariant `self.0.0` is a valid request. Because we have an `Owned<_>`,
+ // the refcount is zero.
+ let mut request = unsafe { Request::from_raw(self.0 .0.get()) };
+
+ debug_assert!(
+ request
+ .wrapper_ref()
+ .refcount()
+ .as_atomic()
+ .load(ordering::Acquire)
+ == 0
+ );
+
+ // SAFETY: We have exclusive access and the refcount is 0. By type invariant `request` was
+ // not started yet.
+ unsafe { request.start_unchecked() };
+
+ request
+ }
+
+ /// Create a [`Self`] from a raw request pointer.
+ ///
+ /// # Safety
+ ///
+ /// - The request pointed to by `ptr` must satisfythe invariants of both [`Request`] and
+ /// [`Self`].
+ /// - The refcount of the request pointed to by `ptr` must be 0.
+ pub(crate) unsafe fn from_raw(ptr: *mut bindings::request) -> Owned<Self> {
+ // SAFETY: By function safety requirements, `ptr` is valid for use as an `IdleRequest`.
+ unsafe { Owned::from_raw(NonNull::<Self>::new_unchecked(ptr.cast())) }
+ }
+}
+
+impl<T: Operations> Ownable for IdleRequest<T> {
+ // The `release` implementation leaks the `IdleRequest`, which is a valid state for a
+ // [`Request`] with refcount 0.
+ unsafe fn release(&mut self) {}
+}
+
+impl<T: Operations> Deref for IdleRequest<T> {
+ type Target = RequestInner<T>;
+
+ fn deref(&self) -> &Self::Target {
+ &self.0
+ }
+}
+
+pub struct RequestInner<T>(Opaque<bindings::request>, PhantomData<T>);
+
+impl<T: Operations> RequestInner<T> {
+ /// Get the command identifier for the request
+ pub fn command(&self) -> u32 {
+ // SAFETY: By C API contract and type invariant, `cmd_flags` is valid for read
+ unsafe { (*self.0.get()).cmd_flags & ((1 << bindings::REQ_OP_BITS) - 1) }
+ }
+
+ /// Get the target sector for the request.
+ #[inline(always)]
+ pub fn sector(&self) -> u64 {
+ // SAFETY: By type invariant of `Self`, `self.0` is valid and live.
+ unsafe { (*self.0.get()).__sector }
+ }
+
+ /// Get the size of the request in number of sectors.
+ #[inline(always)]
+ pub fn sectors(&self) -> u32 {
+ self.bytes() >> crate::block::SECTOR_SHIFT
+ }
+
+ /// Get the size of the request in bytes.
+ #[inline(always)]
+ pub fn bytes(&self) -> u32 {
+ // SAFETY: By type invariant of `Self`, `self.0` is valid and live.
+ unsafe { (*self.0.get()).__data_len }
+ }
+
+ /// Borrow the queue data from the request queue associated with this request.
+ pub fn queue_data(&self) -> <T::QueueData as ForeignOwnable>::Borrowed<'_> {
+ // SAFETY: By type invariants of `Request`, `self.0` is a valid request.
+ unsafe { T::QueueData::borrow((*(*self.0.get()).q).queuedata) }
+ }
+}
+
/// A wrapper around a blk-mq [`struct request`]. This represents an IO request.
///
/// # Lifetime
@@ -96,9 +196,28 @@
/// [`struct request`]: srctree/include/linux/blk-mq.h
///
#[repr(transparent)]
-pub struct Request<T>(Opaque<bindings::request>, PhantomData<T>);
+pub struct Request<T>(RequestInner<T>);
+
+impl<T: Operations> Deref for Request<T> {
+ type Target = RequestInner<T>;
+
+ fn deref(&self) -> &Self::Target {
+ &self.0
+ }
+}
impl<T: Operations> Request<T> {
+ /// Create a `Owned<Request>` from a request pointer.
+ ///
+ /// # Safety
+ ///
+ /// - `ptr` must satisfy invariants of `Request`.
+ /// - The refcount of the request pointed to by `ptr` must be 0.
+ pub(crate) unsafe fn from_raw(ptr: *mut bindings::request) -> Owned<Self> {
+ // SAFETY: By function safety requirements, `ptr` is valid for use as `Owned<Request>`.
+ unsafe { Owned::from_raw(NonNull::<Self>::new_unchecked(ptr.cast())) }
+ }
+
/// Create an [`ARef<Request>`] from a [`struct request`] pointer.
///
/// # Safety
@@ -120,7 +239,7 @@ pub(crate) unsafe fn aref_from_raw(ptr: *mut bindings::request) -> ARef<Self> {
pub fn command(&self) -> u32 {
use core::ops::BitAnd;
// SAFETY: By C API contract and type invariant, `cmd_flags` is valid for read
- unsafe { (*self.0.get()).cmd_flags }.bitand((1u32 << bindings::REQ_OP_BITS) - 1)
+ unsafe { (*self.0 .0.get()).cmd_flags }.bitand((1u32 << bindings::REQ_OP_BITS) - 1)
}
/// Complete the request by scheduling `Operations::complete` for
@@ -145,7 +264,7 @@ pub fn complete(this: ARef<Self>) {
pub fn bio(&self) -> Option<&Bio> {
// SAFETY: By type invariant of `Self`, `self.0` is valid and the deref
// is safe.
- let ptr = unsafe { (*self.0.get()).bio };
+ let ptr = unsafe { (*self.0 .0.get()).bio };
// SAFETY: By C API contract, if `bio` is not null it will have a
// positive refcount at least for the duration of the lifetime of
// `&self`.
@@ -157,7 +276,7 @@ pub fn bio(&self) -> Option<&Bio> {
pub fn bio_mut(self: Pin<&mut Self>) -> Option<Pin<&mut Bio>> {
// SAFETY: By type invariant of `Self`, `self.0` is valid and the deref
// is safe.
- let ptr = unsafe { (*self.0.get()).bio };
+ let ptr = unsafe { (*self.0 .0.get()).bio };
// SAFETY: By C API contract, if `bio` is not null it will have a
// positive refcount at least for the duration of the lifetime of
// `&mut self`.
@@ -171,25 +290,11 @@ pub fn bio_iter_mut<'a>(self: &'a mut Owned<Self>) -> BioIterator<'a> {
// `NonNull::new` will return `None` if the pointer is null.
BioIterator {
// SAFETY: By type invariant `self.0` is a valid `struct request`.
- bio: NonNull::new(unsafe { (*self.0.get()).bio.cast() }),
+ bio: NonNull::new(unsafe { (*self.0 .0.get()).bio.cast() }),
_p: PhantomData,
}
}
- /// Get the target sector for the request.
- #[inline(always)]
- pub fn sector(&self) -> u64 {
- // SAFETY: By type invariant of `Self`, `self.0` is valid and live.
- unsafe { (*self.0.get()).__sector }
- }
-
- /// Get the size of the request in number of sectors.
- #[inline(always)]
- pub fn sectors(&self) -> u32 {
- // SAFETY: By type invariant of `Self`, `self.0` is valid and live.
- (unsafe { (*self.0.get()).__data_len }) >> crate::block::SECTOR_SHIFT
- }
-
/// Return a pointer to the [`RequestDataWrapper`] stored in the private area
/// of the request structure.
///
@@ -328,10 +433,10 @@ impl<T: Operations> Owned<Request<T>> {
/// `self.wrapper_ref().refcount() == 0`.
///
/// This can only be called once in the request life cycle.
- pub(crate) unsafe fn start_unchecked(&mut self) {
+ pub unsafe fn start_unchecked(&mut self) {
// SAFETY: By type invariant, `self.0` is a valid `struct request` and
// we have exclusive access.
- unsafe { bindings::blk_mq_start_request(self.0.get()) };
+ unsafe { bindings::blk_mq_start_request(self.0 .0.get()) };
}
/// Notify the block layer that the request has been completed without errors.
@@ -341,7 +446,7 @@ pub fn end_ok(self) {
/// Notify the block layer that the request has been completed.
pub fn end(self, status: u8) {
- let request_ptr = self.0.get().cast();
+ let request_ptr = self.0 .0.get().cast();
core::mem::forget(self);
// SAFETY: By type invariant, `this.0` was a valid `struct request`. The
// existence of `self` guarantees that there are no `ARef`s pointing to
--
2.51.2
^ permalink raw reply related
* [PATCH v2 68/83] block: rust: add an abstraction for `struct rq_list`
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add the `RequestList` type as a safe wrapper around the C `struct
rq_list`. This type provides methods to iterate over and manipulate
lists of block requests, which is needed for implementing the
`queue_rqs` callback.
The abstraction includes methods for popping requests from the list,
checking if the list is empty, and peeking at the head request.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/helpers/blk.c | 26 ++++++++
rust/kernel/block/mq.rs | 2 +
rust/kernel/block/mq/request_list.rs | 119 +++++++++++++++++++++++++++++++++++
3 files changed, 147 insertions(+)
diff --git a/rust/helpers/blk.c b/rust/helpers/blk.c
index 500e3c6fd951..422289d617ae 100644
--- a/rust/helpers/blk.c
+++ b/rust/helpers/blk.c
@@ -27,3 +27,29 @@ bool rust_helper_blk_mq_add_to_batch(struct request *req,
{
return blk_mq_add_to_batch(req, iob, is_error, complete);
}
+
+__rust_helper struct request *rust_helper_rq_list_pop(struct rq_list *rl)
+{
+ return rq_list_pop(rl);
+}
+
+__rust_helper int rust_helper_rq_list_empty(const struct rq_list *rl)
+{
+ return rq_list_empty(rl);
+}
+
+__rust_helper void rust_helper_rq_list_add_tail(struct rq_list *rl,
+ struct request *rq)
+{
+ rq_list_add_tail(rl, rq);
+}
+
+__rust_helper void rust_helper_rq_list_init(struct rq_list *rl)
+{
+ rq_list_init(rl);
+}
+
+__rust_helper struct request *rust_helper_rq_list_peek(struct rq_list *rl)
+{
+ return rq_list_peek(rl);
+}
diff --git a/rust/kernel/block/mq.rs b/rust/kernel/block/mq.rs
index 7c346be843e1..e8f0d03f2ff7 100644
--- a/rust/kernel/block/mq.rs
+++ b/rust/kernel/block/mq.rs
@@ -129,6 +129,7 @@
pub mod gen_disk;
mod operations;
mod request;
+mod request_list;
mod request_queue;
pub mod tag_set;
@@ -148,6 +149,7 @@
Request,
RequestTimerHandle, //
};
+pub use request_list::RequestList;
pub use request_queue::RequestQueue;
pub use tag_set::{
QueueType,
diff --git a/rust/kernel/block/mq/request_list.rs b/rust/kernel/block/mq/request_list.rs
new file mode 100644
index 000000000000..82e6005126f7
--- /dev/null
+++ b/rust/kernel/block/mq/request_list.rs
@@ -0,0 +1,119 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use core::marker::PhantomData;
+
+use crate::{
+ owned::Owned,
+ types::Opaque, //
+};
+
+use super::{
+ IdleRequest,
+ Operations, //
+};
+
+/// A list of [`Request`].
+///
+/// # INVARIANTS
+///
+/// - `self.inner` is always a valid list, meaning the `next` and `prev`
+/// pointers point to valid requests, or are both null.
+/// - All requests in the list are valid for use as `IdleRequest<T>`.
+#[repr(transparent)]
+pub struct RequestList<T: Operations> {
+ inner: Opaque<bindings::rq_list>,
+ _p: PhantomData<T>,
+}
+
+impl<T: Operations> RequestList<T> {
+ /// Create a new [`RequestList`].
+ pub fn new() -> Self {
+ let this = Self {
+ inner: Opaque::zeroed(),
+ _p: PhantomData,
+ };
+
+ // NOTE: We are actually good to go, but we call the C initializer for forward
+ // compatibility.
+ // SAFETY: `this.inner` is a valid allocation for use as `bindings::rq_list!.
+ unsafe { bindings::rq_list_init(this.inner.get()) }
+
+ //INVARIANT: `self.inner` was initialized above and is empty.
+ this
+ }
+
+ /// Create a mutable reference to a [`RequestList`] from a raw pointer.
+ ///
+ /// # SAFETY
+ /// - The list pointed to by `ptr` must satisfy the invariants of `Self`.
+ /// - The list pointed to by `ptr` must remain valid for use as a mutable reference for the
+ /// duration of `'a`.
+ pub unsafe fn from_raw<'a>(ptr: *mut bindings::rq_list) -> &'a mut Self {
+ // SAFETY:
+ // - RequestList is transparent.
+ // - By function safety requirements, `ptr` is valid for us as a mutable reference.
+ unsafe { &mut (*ptr.cast()) }
+ }
+
+ /// Check if the list is empty.
+ pub fn empty(&self) -> bool {
+ // SAFETY: By type invariant, self.inner is valid.
+ let ret = unsafe { bindings::rq_list_empty(self.inner.get()) };
+ ret != 0
+ }
+
+ /// Pop a request from the list.
+ ///
+ /// Returns [`None`] if the list is empty.
+ pub fn pop(&mut self) -> Option<Owned<IdleRequest<T>>> {
+ // SAFETY: By type invariant `self.inner` is a valid list.
+ let ptr = unsafe { bindings::rq_list_pop(self.inner.get()) };
+
+ if !ptr.is_null() {
+ // SAFETY: If `rq_list_pop` returns a non-null pointer, it points to a valid request. By
+ // type invariant all requests in this list are valid for use as `IdleRequest`.
+ Some(unsafe { IdleRequest::from_raw(ptr) })
+ } else {
+ None
+ }
+ }
+
+ /// Push a request on the tail of the list.
+ pub fn push_tail(&mut self, rq: Owned<IdleRequest<T>>) {
+ let ptr = rq.as_raw();
+ core::mem::forget(rq);
+ // INVARIANT: rq is an `IdleRequest<T>`.
+ // SAFETY: By type invariant, `self.inner` is a valid list.
+ unsafe { bindings::rq_list_add_tail(self.inner.get(), ptr) };
+ }
+
+ /// Peek at the head of the list.
+ ///
+ /// Returns a null pointer if the list is empty.
+ pub fn peek_raw(&self) -> *mut bindings::request {
+ // SAFETY: By type invariant, `self.inner` is a valid list.
+ unsafe { bindings::rq_list_peek(self.inner.get()) }
+ }
+}
+
+impl<T: Operations> Default for RequestList<T> {
+ fn default() -> Self {
+ Self::new()
+ }
+}
+
+impl<T: Operations> Drop for RequestList<T> {
+ fn drop(&mut self) {
+ while let Some(rq) = self.pop() {
+ drop(rq)
+ }
+ }
+}
+
+impl<T: Operations> Iterator for &mut RequestList<T> {
+ type Item = Owned<IdleRequest<T>>;
+
+ fn next(&mut self) -> Option<Self::Item> {
+ self.pop()
+ }
+}
--
2.51.2
^ permalink raw reply related
* [PATCH v2 57/83] block: rust: add accessors to `TagSet`
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add `hw_queue_count()` to query the number of hardware queues and
`data()` to borrow the private tag set data associated with a `TagSet`.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/kernel/block/mq/tag_set.rs | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/rust/kernel/block/mq/tag_set.rs b/rust/kernel/block/mq/tag_set.rs
index e62dfd267fd9..858c1b952b00 100644
--- a/rust/kernel/block/mq/tag_set.rs
+++ b/rust/kernel/block/mq/tag_set.rs
@@ -4,8 +4,6 @@
//!
//! C header: [`include/linux/blk-mq.h`](srctree/include/linux/blk-mq.h)
-use core::pin::Pin;
-
use crate::{
alloc::NumaNode,
bindings,
@@ -26,7 +24,8 @@
};
use core::{
convert::TryInto,
- marker::PhantomData, //
+ marker::PhantomData,
+ pin::Pin, //
};
use pin_init::{
pin_data,
@@ -164,6 +163,22 @@ pub fn update_maps(self: Pin<&mut Self>, mut cb: impl FnMut(QueueMap)) -> Result
Ok(())
}
+
+ /// Return the number of hardware queues for this tag set.
+ pub fn hw_queue_count(&self) -> u32 {
+ // SAFETY: By type invariant, `self.inner` is valid.
+ unsafe { (*self.inner.get()).nr_hw_queues }
+ }
+
+ /// Borrow the [`T::TagSetData`] associated with this tag set.
+ pub fn data(&self) -> <T::TagSetData as ForeignOwnable>::Borrowed<'_> {
+ // SAFETY: By type invariant, `self.inner` is valid.
+ let ptr = unsafe { (*self.inner.get()).driver_data };
+
+ // SAFETY: `ptr` was created by `into_foreign` during initialization and the target is not
+ // converted back with `from_foreign` while `&self` is live.
+ unsafe { T::TagSetData::borrow(ptr) }
+ }
}
#[pinned_drop]
--
2.51.2
^ permalink raw reply related
* [PATCH v2 40/83] block: rust: add a method to get the request queue for a request
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add a method to `Request` for obtaining the associated `RequestQueue`.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/kernel/block/mq/request.rs | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request.rs
index f94e9c2181d0..a05df2351c2c 100644
--- a/rust/kernel/block/mq/request.rs
+++ b/rust/kernel/block/mq/request.rs
@@ -39,6 +39,7 @@
ptr::NonNull, //
};
+use super::RequestQueue;
use crate::block::bio::{
Bio,
BioIterator, //
@@ -140,6 +141,12 @@ pub fn queue_data(&self) -> <T::QueueData as ForeignOwnable>::Borrowed<'_> {
// SAFETY: By type invariants of `Request`, `self.0` is a valid request.
unsafe { T::QueueData::borrow((*(*self.0.get()).q).queuedata) }
}
+
+ /// Get the request queue associated with this request.
+ pub fn queue(&self) -> &RequestQueue<T> {
+ // SAFETY: By type invariant, self.0 is guaranteed to be valid.
+ unsafe { RequestQueue::from_raw((*self.0.get()).q) }
+ }
}
/// A wrapper around a blk-mq [`struct request`]. This represents an IO request.
--
2.51.2
^ permalink raw reply related
* [PATCH v2 63/83] block: rust: add `Segment::copy_to_page_limit`
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add a method to `block::mq::bio::Segment` to copy a bounded amount of bytes
to a page.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/kernel/block/bio/vec.rs | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/block/bio/vec.rs b/rust/kernel/block/bio/vec.rs
index 61d83a07397f..82e89a1d17c3 100644
--- a/rust/kernel/block/bio/vec.rs
+++ b/rust/kernel/block/bio/vec.rs
@@ -102,13 +102,38 @@ pub fn truncate(&mut self, new_len: u32) {
/// Returns the number of bytes copied.
#[inline(always)]
pub fn copy_to_page(&mut self, dst_page: Pin<&mut SafePage>, dst_offset: usize) -> usize {
+ self.copy_to_page_limit(dst_page, dst_offset, 0)
+ }
+
+ /// Copy data of this segment into `dst_page`.
+ ///
+ /// Copies at most `limit` bytes of data from the current offset to the next page boundary. That
+ /// is `PAGE_SIZE - (self.offeset() % PAGE_SIZE)` bytes of data. Data is placed at offset
+ /// `self.offset()` in the target page. This call will advance offset and reduce length of
+ /// `self`.
+ ///
+ /// If `limit` is zero it is ignored.
+ ///
+ /// Returns the number of bytes copied.
+ #[inline(always)]
+ pub fn copy_to_page_limit(
+ &mut self,
+ dst_page: Pin<&mut SafePage>,
+ dst_offset: usize,
+ limit: usize,
+ ) -> usize {
// SAFETY: We are not moving out of `dst_page`.
let dst_page = unsafe { Pin::into_inner_unchecked(dst_page) };
let src_offset = self.offset() % PAGE_SIZE;
debug_assert!(dst_offset <= PAGE_SIZE);
- let length = (PAGE_SIZE - src_offset)
+ let mut length = (PAGE_SIZE - src_offset)
.min(self.len() as usize)
.min(PAGE_SIZE - dst_offset);
+
+ if limit > 0 {
+ length = length.min(limit);
+ }
+
let page_idx = self.offset() / PAGE_SIZE;
// SAFETY: self.bio_vec is valid and thus bv_page must be a valid
--
2.51.2
^ permalink raw reply related
* [PATCH v2 64/83] block: rnull: add fua support
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add Forced Unit Access (FUA) support to rnull. When enabled via the `fua`
configfs attribute, the driver advertises FUA capability and handles FUA
requests by bypassing the volatile cache in the write path.
FUA support requires memory backing and write cache to be enabled.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
drivers/block/rnull/configfs.rs | 5 ++++
drivers/block/rnull/disk_storage.rs | 22 +++++++++++++----
drivers/block/rnull/disk_storage/page.rs | 1 +
drivers/block/rnull/rnull.rs | 41 ++++++++++++++++++++++++++------
4 files changed, 58 insertions(+), 11 deletions(-)
diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index 0637c1e0ab22..8195d645ecc6 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -128,6 +128,7 @@ fn make_group(
zone_max_active: 25,
zone_append_max_sectors: 26,
poll_queues: 27,
+ fua: 28,
],
};
@@ -169,6 +170,7 @@ fn make_group(
zone_max_active: 0,
zone_append_max_sectors: u32::MAX,
poll_queues: 0,
+ fua: true,
}),
}),
core::iter::empty(),
@@ -256,6 +258,7 @@ struct DeviceConfigInner {
zone_max_active: u32,
zone_append_max_sectors: u32,
poll_queues: u32,
+ fua: bool,
}
#[vtable]
@@ -322,6 +325,7 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
zone_max_open: guard.zone_max_open,
zone_max_active: guard.zone_max_active,
zone_append_max_sectors: guard.zone_append_max_sectors,
+ forced_unit_access: guard.fua,
})?);
guard.powered = true;
} else if guard.powered && !power_op {
@@ -515,3 +519,4 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
}
})
);
+configfs_simple_bool_field!(DeviceConfig, 28, fua);
diff --git a/drivers/block/rnull/disk_storage.rs b/drivers/block/rnull/disk_storage.rs
index 7667830bd616..4a9bf480221f 100644
--- a/drivers/block/rnull/disk_storage.rs
+++ b/drivers/block/rnull/disk_storage.rs
@@ -92,6 +92,10 @@ pub(crate) fn flush(&self, hw_data: &Pin<&SpinLock<HwQueueContext>>) -> Result {
let mut access = self.access(&mut tree_guard, &mut hw_data_guard, None);
access.flush()
}
+
+ pub(crate) fn cache_enabled(&self) -> bool {
+ self.cache_size > 0
+ }
}
pub(crate) struct DiskStorageAccess<'a, 'b, 'c> {
@@ -205,7 +209,7 @@ fn flush(&mut self) -> Result {
Ok(())
}
- fn get_cache_page(&mut self, sector: u64) -> Result<&mut NullBlockPage> {
+ fn get_or_alloc_cache_page(&mut self, sector: u64) -> Result<&mut NullBlockPage> {
let index = Self::to_index(sector);
match self.cache_guard.entry(index) {
@@ -239,6 +243,12 @@ fn get_cache_page(&mut self, sector: u64) -> Result<&mut NullBlockPage> {
}
}
+ pub(crate) fn get_cache_page(&mut self, sector: u64) -> Option<&mut NullBlockPage> {
+ let index = Self::to_index(sector);
+
+ self.cache_guard.get_mut(index)
+ }
+
fn get_disk_page(&mut self, sector: u64) -> Result<&mut NullBlockPage> {
let index = Self::to_index(sector);
@@ -256,9 +266,13 @@ fn get_disk_page(&mut self, sector: u64) -> Result<&mut NullBlockPage> {
Ok(page)
}
- pub(crate) fn get_write_page(&mut self, sector: u64) -> Result<&mut NullBlockPage> {
- let page = if self.disk_storage.cache_size > 0 {
- self.get_cache_page(sector)?
+ pub(crate) fn get_write_page(
+ &mut self,
+ sector: u64,
+ bypass_cache: bool,
+ ) -> Result<&mut NullBlockPage> {
+ let page = if self.disk_storage.cache_size > 0 && !bypass_cache {
+ self.get_or_alloc_cache_page(sector)?
} else {
self.get_disk_page(sector)?
};
diff --git a/drivers/block/rnull/disk_storage/page.rs b/drivers/block/rnull/disk_storage/page.rs
index 88dc9a2476bd..846269d31c63 100644
--- a/drivers/block/rnull/disk_storage/page.rs
+++ b/drivers/block/rnull/disk_storage/page.rs
@@ -15,6 +15,7 @@
uapi::PAGE_SECTORS, //
};
+// TODO: Use rust bitmap
static_assert!((PAGE_SIZE >> SECTOR_SHIFT) <= 64);
pub(crate) struct NullBlockPage {
diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index 0695cbd07f1d..c3126b923367 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -191,6 +191,10 @@
default: 0,
description: "Number of IOPOLL submission queues.",
},
+ fua: bool {
+ default: true,
+ description: "Enable/disable FUA support when cache_size is used.",
+ },
},
}
@@ -267,6 +271,7 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
zone_max_open: module_parameters::zone_max_open.value(),
zone_max_active: module_parameters::zone_max_active.value(),
zone_append_max_sectors: module_parameters::zone_append_max_sectors.value(),
+ forced_unit_access: module_parameters::fua.value(),
})?;
disks.push(disk, GFP_KERNEL)?;
}
@@ -307,6 +312,7 @@ struct NullBlkOptions<'a> {
zone_max_active: u32,
#[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(dead_code))]
zone_append_max_sectors: u32,
+ forced_unit_access: bool,
}
#[pin_data]
@@ -422,6 +428,7 @@ fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
zone_max_active,
#[cfg_attr(not(CONFIG_BLK_DEV_ZONED), allow(unused_variables))]
zone_append_max_sectors,
+ forced_unit_access,
} = options;
let memory_backed = tag_set.memory_backed;
@@ -439,9 +446,10 @@ fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
return Err(code::EINVAL);
}
+ let s = storage.clone();
let queue_data = Arc::try_pin_init(
try_pin_init!(Self {
- storage,
+ storage: s,
irq_mode,
completion_time,
memory_backed,
@@ -474,7 +482,9 @@ fn new(options: NullBlkOptions<'_>) -> Result<Arc<GenDisk<Self>>> {
.capacity_sectors(device_capacity_sectors)
.logical_block_size(block_size_bytes)?
.physical_block_size(block_size_bytes)?
- .rotational(rotational);
+ .rotational(rotational)
+ .write_cache(storage.cache_enabled())
+ .forced_unit_access(forced_unit_access && storage.cache_enabled());
#[cfg(CONFIG_BLK_DEV_ZONED)]
{
@@ -553,6 +563,7 @@ fn write<'a, 'b, 'c>(
hw_data_guard: &'b mut SpinLockGuard<'c, HwQueueContext>,
mut sector: u64,
mut segment: Segment<'_>,
+ bypass_cache: bool,
) -> Result {
let mut sheaf: Option<XArraySheaf<'_>> = None;
@@ -561,7 +572,13 @@ fn write<'a, 'b, 'c>(
let mut access = self.storage.access(tree_guard, hw_data_guard, sheaf);
- let page = access.get_write_page(sector)?;
+ if bypass_cache {
+ if let Some(page) = access.get_cache_page(sector) {
+ page.set_free(sector);
+ }
+ }
+
+ let page = access.get_write_page(sector, bypass_cache)?;
page.set_occupied(sector);
// CAST: Page offset always fits in 32 bits.
@@ -569,7 +586,11 @@ fn write<'a, 'b, 'c>(
((sector & u64::from(block::PAGE_SECTOR_MASK)) << block::SECTOR_SHIFT) as usize;
// CAST: Casting from `usize` to `u64` never overflows.
- sector += segment.copy_to_page(page.page_mut().as_pin_mut(), page_offset) as u64
+ sector += segment.copy_to_page_limit(
+ page.page_mut().as_pin_mut(),
+ page_offset,
+ self.block_size_bytes.try_into()?,
+ ) as u64
>> block::SECTOR_SHIFT;
sheaf = access.sheaf;
@@ -632,6 +653,8 @@ fn transfer(
let mut hw_data_guard = hw_data.lock();
let mut tree_guard = self.storage.lock();
+ let skip_cache = rq.flags().contains(mq::RequestFlag::ForcedUnitAccess);
+
for bio in rq.bio_iter_mut() {
let segment_iter = bio.segment_iter();
for mut segment in segment_iter {
@@ -641,9 +664,13 @@ fn transfer(
let length_sectors_allowed = segment_length_sectors.min(max_remaining_sectors);
segment.truncate(length_sectors_allowed << SECTOR_SHIFT);
match command {
- mq::Command::Write => {
- self.write(&mut tree_guard, &mut hw_data_guard, sector, segment)?
- }
+ mq::Command::Write => self.write(
+ &mut tree_guard,
+ &mut hw_data_guard,
+ sector,
+ segment,
+ skip_cache,
+ )?,
mq::Command::Read => {
self.read(&mut tree_guard, &mut hw_data_guard, sector, segment)?
}
--
2.51.2
^ permalink raw reply related
* [PATCH v2 22/83] block: rust: mq: add max_hw_discard_sectors support to GenDiskBuilder
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add support for configuring the maximum hardware discard sectors
through GenDiskBuilder. This allows block devices to specify their
discard/trim capabilities.
Setting this value to 0 (the default) indicates that discard is not
supported by the device. Non-zero values specify the maximum number
of sectors that can be discarded in a single operation.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/kernel/block/mq/gen_disk.rs | 34 ++++++++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)
diff --git a/rust/kernel/block/mq/gen_disk.rs b/rust/kernel/block/mq/gen_disk.rs
index b36d24382cc3..2b204b0ed49a 100644
--- a/rust/kernel/block/mq/gen_disk.rs
+++ b/rust/kernel/block/mq/gen_disk.rs
@@ -7,14 +7,27 @@
use crate::{
bindings,
- block::mq::{Operations, TagSet},
- error::{self, from_err_ptr, Result},
- fmt::{self, Write},
+ block::mq::{
+ Operations,
+ TagSet, //
+ },
+ error::{
+ self,
+ from_err_ptr,
+ Result, //
+ },
+ fmt::{
+ self,
+ Write, //
+ },
prelude::*,
static_lock_class,
str::NullTerminatedFormatter,
sync::Arc,
- types::{ForeignOwnable, ScopeGuard},
+ types::{
+ ForeignOwnable,
+ ScopeGuard, //
+ },
};
/// A builder for [`GenDisk`].
@@ -25,6 +38,7 @@ pub struct GenDiskBuilder {
logical_block_size: u32,
physical_block_size: u32,
capacity_sectors: u64,
+ max_hw_discard_sectors: u32,
}
impl Default for GenDiskBuilder {
@@ -34,6 +48,7 @@ fn default() -> Self {
logical_block_size: bindings::PAGE_SIZE as u32,
physical_block_size: bindings::PAGE_SIZE as u32,
capacity_sectors: 0,
+ max_hw_discard_sectors: 0,
}
}
}
@@ -94,6 +109,16 @@ pub fn capacity_sectors(mut self, capacity: u64) -> Self {
self
}
+ /// Set the maximum amount of sectors the underlying hardware device can
+ /// discard/trim in a single operation.
+ ///
+ /// Setting 0 (default) here will cause the disk to report discard not
+ /// supported.
+ pub fn max_hw_discard_sectors(mut self, max_hw_discard_sectors: u32) -> Self {
+ self.max_hw_discard_sectors = max_hw_discard_sectors;
+ self
+ }
+
/// Build a new `GenDisk` and add it to the VFS.
pub fn build<T: Operations>(
self,
@@ -111,6 +136,7 @@ pub fn build<T: Operations>(
lim.logical_block_size = self.logical_block_size;
lim.physical_block_size = self.physical_block_size;
+ lim.max_hw_discard_sectors = self.max_hw_discard_sectors;
if self.rotational {
lim.features = bindings::BLK_FEAT_ROTATIONAL;
}
--
2.51.2
^ permalink raw reply related
* [PATCH v2 58/83] block: rnull: add polled completion support
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add support for polled I/O completion in rnull. This feature requires
configuring poll queues via the `poll_queues` attribute.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
drivers/block/rnull/configfs.rs | 19 +++++-
drivers/block/rnull/rnull.rs | 133 ++++++++++++++++++++++++++++++++++++----
2 files changed, 139 insertions(+), 13 deletions(-)
diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index f866595a263c..0637c1e0ab22 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -81,7 +81,7 @@ impl AttributeOperations<0> for Config {
writer.write_str(
"blocksize,size,rotational,irqmode,completion_nsec,memory_backed,\
submit_queues,use_per_node_hctx,discard,blocking,shared_tags,\
- zoned,zone_size,zone_capacity\n",
+ zoned,zone_size,zone_capacity,poll_queues\n",
)?;
Ok(writer.bytes_written())
}
@@ -127,6 +127,7 @@ fn make_group(
zone_max_open: 24,
zone_max_active: 25,
zone_append_max_sectors: 26,
+ poll_queues: 27,
],
};
@@ -167,6 +168,7 @@ fn make_group(
zone_max_open: 0,
zone_max_active: 0,
zone_append_max_sectors: u32::MAX,
+ poll_queues: 0,
}),
}),
core::iter::empty(),
@@ -253,6 +255,7 @@ struct DeviceConfigInner {
zone_max_open: u32,
zone_max_active: u32,
zone_append_max_sectors: u32,
+ poll_queues: u32,
}
#[vtable]
@@ -305,6 +308,7 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
shared_tag_set: guard.shared_tags.then(|| guard.shared_tag_set.clone()),
tag_set: crate::TagSetOptions {
submit_queues: guard.submit_queues,
+ poll_queues: guard.poll_queues,
home_node: guard.home_node,
blocking: guard.blocking,
memory_backed: guard.memory_backed,
@@ -498,3 +502,16 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
configfs_simple_field!(DeviceConfig, 24, zone_max_open, u32);
configfs_simple_field!(DeviceConfig, 25, zone_max_active, u32);
configfs_simple_field!(DeviceConfig, 26, zone_append_max_sectors, u32);
+configfs_simple_field!(
+ DeviceConfig,
+ 27,
+ poll_queues,
+ u32,
+ check(|value| {
+ if value > kernel::cpu::num_possible_cpus() {
+ Err(kernel::error::code::EINVAL)
+ } else {
+ Ok(())
+ }
+ })
+);
diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index 076493f92516..edb4ef53d6ad 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -33,6 +33,7 @@
GenDisk,
GenDiskRef, //
},
+ IoCompletionBatch,
Operations,
TagSet, //
},
@@ -186,6 +187,10 @@
default: 0,
description: "Maximum size of a zone append command (in 512B sectors). Specify 0 for no zone append.",
},
+ poll_queues: u32 {
+ default: 0,
+ description: "Number of IOPOLL submission queues.",
+ },
},
}
@@ -207,6 +212,7 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
} else {
module_parameters::submit_queues.value()
};
+ let poll_queues = module_parameters::poll_queues.value();
let home_node = module_parameters::home_node.value();
let blocking = module_parameters::blocking.value();
let memory_backed = module_parameters::memory_backed.value();
@@ -215,6 +221,7 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
let shared_tag_set = NullBlkDevice::build_tag_set(TagSetOptions {
submit_queues,
+ poll_queues,
home_node,
blocking,
memory_backed,
@@ -246,6 +253,7 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
.then(|| shared_tag_set.clone()),
tag_set: TagSetOptions {
submit_queues,
+ poll_queues,
home_node,
blocking,
memory_backed,
@@ -325,6 +333,7 @@ struct NullBlkDevice {
struct TagSetOptions {
submit_queues: u32,
+ poll_queues: u32,
home_node: i32,
blocking: bool,
memory_backed: bool,
@@ -338,6 +347,7 @@ impl NullBlkDevice {
fn build_tag_set(options: TagSetOptions) -> Result<Arc<TagSet<Self>>> {
let TagSetOptions {
submit_queues,
+ poll_queues,
home_node,
blocking,
memory_backed,
@@ -364,7 +374,21 @@ fn build_tag_set(options: TagSetOptions) -> Result<Arc<TagSet<Self>>> {
}
Arc::pin_init(
- TagSet::new(submit_queues, (), hw_queue_depth, 1, numa_node, flags),
+ TagSet::new(
+ submit_queues + poll_queues,
+ KBox::new(
+ NullBlkTagsetData {
+ queue_depth: hw_queue_depth,
+ submit_queue_count: submit_queues,
+ poll_queue_count: poll_queues,
+ },
+ GFP_KERNEL,
+ )?,
+ hw_queue_depth,
+ if poll_queues == 0 { 1 } else { 3 },
+ numa_node,
+ flags,
+ ),
GFP_KERNEL,
)
}
@@ -729,6 +753,7 @@ fn run(
struct HwQueueContext {
page: Option<KBox<disk_storage::NullBlockPage>>,
+ poll_queue: kernel::alloc::ringbuffer::KRingBuffer<Owned<mq::Request<NullBlkDevice>>>,
}
#[pin_data]
@@ -757,11 +782,17 @@ impl HasHrTimer<Self> for Pdu {
}
}
+struct NullBlkTagsetData {
+ queue_depth: u32,
+ submit_queue_count: u32,
+ poll_queue_count: u32,
+}
+
#[vtable]
impl Operations for NullBlkDevice {
type QueueData = Arc<Self>;
type RequestData = Pdu;
- type TagSetData = ();
+ type TagSetData = KBox<NullBlkTagsetData>;
type HwData = Pin<KBox<SpinLock<HwQueueContext>>>;
fn new_request_data() -> impl PinInit<Self::RequestData> {
@@ -777,7 +808,7 @@ fn queue_rq(
this: ArcBorrow<'_, Self>,
rq: Owned<mq::IdleRequest<Self>>,
_is_last: bool,
- _is_poll: bool,
+ is_poll: bool,
) -> BlkResult {
if this.bandwidth_limit != 0 {
if !this.bandwidth_timer.active() {
@@ -814,13 +845,29 @@ fn queue_rq(
#[cfg(not(CONFIG_BLK_DEV_ZONED))]
this.handle_regular_command(&hw_data, &mut rq)?;
- match this.irq_mode {
- IRQMode::None => Self::end_request(rq),
- IRQMode::Soft => mq::Request::complete(rq.into()),
- IRQMode::Timer => {
- OwnableRefCounted::into_shared(rq)
- .start(this.completion_time)
- .dismiss();
+ if is_poll {
+ // NOTE: We lack the ability to insert `Owned<Request>` into a
+ // `kernel::list::List`, so we use a `RingBuffer` instead. The
+ // drawback of this is that we have to allocate the space for the
+ // ring buffer during drive initialization, and we have to hold the
+ // lock protecting the list until we have processed all the requests
+ // in the list. Change to a linked list when the kernel gets this
+ // ability.
+
+ // NOTE: We are processing requests during submit rather than during
+ // poll. This is different from C driver. C driver does processing
+ // during poll.
+
+ hw_data.lock().poll_queue.push_head(rq)?;
+ } else {
+ match this.irq_mode {
+ IRQMode::None => Self::end_request(rq),
+ IRQMode::Soft => mq::Request::complete(rq.into()),
+ IRQMode::Timer => {
+ OwnableRefCounted::into_shared(rq)
+ .start(this.completion_time)
+ .dismiss();
+ }
}
}
Ok(())
@@ -828,8 +875,40 @@ fn queue_rq(
fn commit_rqs(_hw_data: Pin<&SpinLock<HwQueueContext>>, _queue_data: ArcBorrow<'_, Self>) {}
- fn init_hctx(_tagset_data: (), _hctx_idx: u32) -> Result<Self::HwData> {
- KBox::pin_init(new_spinlock!(HwQueueContext { page: None }), GFP_KERNEL)
+ fn poll(
+ hw_data: Pin<&SpinLock<HwQueueContext>>,
+ _this: ArcBorrow<'_, Self>,
+ batch: &mut IoCompletionBatch<Self>,
+ ) -> Result<bool> {
+ let mut guard = hw_data.lock();
+ let mut completed = false;
+
+ while let Some(rq) = guard.poll_queue.pop_tail() {
+ let status = rq.data_ref().error.load(ordering::Relaxed);
+ rq.data_ref().error.store(0, ordering::Relaxed);
+
+ // TODO: check error handling via status
+ if let Err(rq) = batch.add_request(rq, status != 0) {
+ Self::end_request(rq);
+ }
+
+ completed = true;
+ }
+
+ Ok(completed)
+ }
+
+ fn init_hctx(tagset_data: &NullBlkTagsetData, _hctx_idx: u32) -> Result<Self::HwData> {
+ KBox::pin_init(
+ new_spinlock!(HwQueueContext {
+ page: None,
+ poll_queue: kernel::alloc::ringbuffer::KRingBuffer::new(
+ tagset_data.queue_depth.try_into()?,
+ GFP_KERNEL,
+ )?,
+ }),
+ GFP_KERNEL,
+ )
}
fn complete(rq: ARef<mq::Request<Self>>) {
@@ -849,4 +928,34 @@ fn report_zones(
) -> Result<u32> {
Self::report_zones_internal(disk, sector, nr_zones, callback)
}
+
+ fn map_queues(tag_set: Pin<&mut TagSet<Self>>) {
+ let mut submit_queue_count = tag_set.data().submit_queue_count;
+ let mut poll_queue_count = tag_set.data().poll_queue_count;
+
+ if tag_set.hw_queue_count() != submit_queue_count + poll_queue_count {
+ pr_warn!(
+ "tag set has unexpected hardware queue count: {}\n",
+ tag_set.hw_queue_count()
+ );
+ submit_queue_count = 1;
+ poll_queue_count = 0;
+ }
+
+ let mut offset = 0;
+ tag_set
+ .update_maps(|mut qmap| {
+ use mq::QueueType::*;
+ let queue_count = match qmap.kind() {
+ Default => submit_queue_count,
+ Read => 0,
+ Poll => poll_queue_count,
+ };
+ qmap.set_queue_count(queue_count);
+ qmap.set_offset(offset);
+ offset += queue_count;
+ qmap.map_queues();
+ })
+ .unwrap()
+ }
}
--
2.51.2
^ permalink raw reply related
* [PATCH v2 59/83] block: rnull: add REQ_OP_FLUSH support
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add support for handling flush requests in rnull. When memory backing
and write cache are enabled, flush requests trigger a cache flush
operation that writes all dirty cache pages to the backing store.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
drivers/block/rnull/disk_storage.rs | 45 +++++++++++++++++++++++++++++++------
drivers/block/rnull/rnull.rs | 31 +++++++++++++++++--------
2 files changed, 60 insertions(+), 16 deletions(-)
diff --git a/drivers/block/rnull/disk_storage.rs b/drivers/block/rnull/disk_storage.rs
index 82de1f656f68..7667830bd616 100644
--- a/drivers/block/rnull/disk_storage.rs
+++ b/drivers/block/rnull/disk_storage.rs
@@ -85,6 +85,13 @@ pub(crate) fn discard(
remaining_bytes -= processed;
}
}
+
+ pub(crate) fn flush(&self, hw_data: &Pin<&SpinLock<HwQueueContext>>) -> Result {
+ let mut tree_guard = self.lock();
+ let mut hw_data_guard = hw_data.lock();
+ let mut access = self.access(&mut tree_guard, &mut hw_data_guard, None);
+ access.flush()
+ }
}
pub(crate) struct DiskStorageAccess<'a, 'b, 'c> {
@@ -120,18 +127,32 @@ fn to_sector(index: usize) -> u64 {
(index << block::PAGE_SECTORS_SHIFT) as u64
}
+ fn extract_cache_page(&mut self) -> Result<Option<KBox<NullBlockPage>>> {
+ Self::extract_cache_page_inner(
+ &mut self.cache_guard,
+ &mut self.disk_guard,
+ self.disk_storage,
+ self.hw_data_guard,
+ self.sheaf.as_mut(),
+ )
+ }
+
fn extract_cache_page_inner<'g>(
cache_guard: &mut xarray::Guard<'g, TreeNode>,
disk_guard: &mut xarray::Guard<'g, TreeNode>,
disk_storage: &DiskStorage,
hw_data: &mut HwQueueContext,
sheaf: Option<&mut XArraySheaf<'_>>,
- ) -> Result<KBox<NullBlockPage>> {
- let cache_entry = cache_guard
- .find_next_entry_circular(
- disk_storage.next_flush_sector.load(ordering::Relaxed) as usize
- )
- .expect("Expected to find a page in the cache");
+ ) -> Result<Option<KBox<NullBlockPage>>> {
+ let cache_entry = cache_guard.find_next_entry_circular(
+ disk_storage.next_flush_sector.load(ordering::Relaxed) as usize,
+ );
+
+ let cache_entry = if let Some(entry) = cache_entry {
+ entry
+ } else {
+ return Ok(None);
+ };
let index = cache_entry.index();
@@ -172,7 +193,16 @@ fn extract_cache_page_inner<'g>(
}
};
- Ok(page)
+ Ok(Some(page))
+ }
+
+ fn flush(&mut self) -> Result {
+ if self.disk_storage.cache_size > 0 {
+ while let Some(page) = self.extract_cache_page()? {
+ drop(page);
+ }
+ }
+ Ok(())
}
fn get_cache_page(&mut self, sector: u64) -> Result<&mut NullBlockPage> {
@@ -197,6 +227,7 @@ fn get_cache_page(&mut self, sector: u64) -> Result<&mut NullBlockPage> {
self.hw_data_guard,
self.sheaf.as_mut(),
)?
+ .expect("Expected to find a page in the cache")
};
let xarray::Entry::Vacant(vacant_entry) = cache_guard.entry(index) else {
unreachable!("slot was vacant and we hold the lock")
diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index edb4ef53d6ad..0695cbd07f1d 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -719,6 +719,18 @@ fn end_request(rq: Owned<mq::Request<Self>>) {
_ => rq.end(bindings::BLK_STS_IOERR),
}
}
+
+ fn complete_request(&self, rq: Owned<mq::Request<Self>>) {
+ match self.irq_mode {
+ IRQMode::None => Self::end_request(rq),
+ IRQMode::Soft => mq::Request::complete(rq.into()),
+ IRQMode::Timer => {
+ OwnableRefCounted::into_shared(rq)
+ .start(self.completion_time)
+ .dismiss();
+ }
+ }
+ }
}
impl_has_hr_timer! {
@@ -835,6 +847,15 @@ fn queue_rq(
let mut rq = rq.start();
+ if rq.command() == mq::Command::Flush {
+ if this.memory_backed {
+ this.storage.flush(&hw_data)?;
+ }
+ this.complete_request(rq);
+
+ return Ok(());
+ }
+
#[cfg(CONFIG_BLK_DEV_ZONED)]
if this.zoned.enabled {
this.handle_zoned_command(&hw_data, &mut rq)?;
@@ -860,15 +881,7 @@ fn queue_rq(
hw_data.lock().poll_queue.push_head(rq)?;
} else {
- match this.irq_mode {
- IRQMode::None => Self::end_request(rq),
- IRQMode::Soft => mq::Request::complete(rq.into()),
- IRQMode::Timer => {
- OwnableRefCounted::into_shared(rq)
- .start(this.completion_time)
- .dismiss();
- }
- }
+ this.complete_request(rq);
}
Ok(())
}
--
2.51.2
^ permalink raw reply related
* [PATCH v2 47/83] block: rnull: add queue depth config option
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add a configfs attribute to configure the queue depth (number of tags)
for the rnull block device.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
drivers/block/rnull/configfs.rs | 5 +++++
drivers/block/rnull/rnull.rs | 11 ++++++++++-
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index a84854e7c358..2dfc87dff66a 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -118,6 +118,7 @@ fn make_group(
mbps: 16,
blocking: 17,
shared_tags: 18,
+ hw_queue_depth: 19
],
};
@@ -153,6 +154,7 @@ fn make_group(
blocking: false,
shared_tags: false,
shared_tag_set: self.shared_tag_set.clone(),
+ hw_queue_depth: 64,
}),
}),
core::iter::empty(),
@@ -231,6 +233,7 @@ struct DeviceConfigInner {
blocking: bool,
shared_tags: bool,
shared_tag_set: Arc<TagSet<NullBlkDevice>>,
+ hw_queue_depth: u32,
}
#[vtable]
@@ -274,6 +277,7 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
blocking: guard.blocking,
memory_backed: guard.memory_backed,
no_sched: guard.no_sched,
+ hw_queue_depth: guard.hw_queue_depth,
},
})?);
guard.powered = true;
@@ -447,3 +451,4 @@ fn store(this: &DeviceConfig, page: &[u8]) -> Result {
configfs_simple_field!(DeviceConfig, 16, mbps, u32);
configfs_simple_bool_field!(DeviceConfig, 17, blocking);
configfs_simple_bool_field!(DeviceConfig, 18, shared_tags);
+configfs_simple_field!(DeviceConfig, 19, hw_queue_depth, u32);
diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index bcf6a85f1cbc..491979daa50e 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -147,6 +147,10 @@
default: false,
description: "Share tag set between devices for blk-mq",
},
+ hw_queue_depth: u32 {
+ default: 64,
+ description: "Queue depth for each hardware queue. Default: 64",
+ },
},
}
@@ -172,6 +176,7 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
let blocking = module_parameters::blocking.value();
let memory_backed = module_parameters::memory_backed.value();
let no_sched = module_parameters::no_sched.value();
+ let hw_queue_depth = module_parameters::hw_queue_depth.value();
let shared_tag_set = NullBlkDevice::build_tag_set(TagSetOptions {
submit_queues,
@@ -179,6 +184,7 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
blocking,
memory_backed,
no_sched,
+ hw_queue_depth,
})?;
let mut disks = KVec::new();
@@ -209,6 +215,7 @@ fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
blocking,
memory_backed,
no_sched,
+ hw_queue_depth,
},
})?;
disks.push(disk, GFP_KERNEL)?;
@@ -264,6 +271,7 @@ struct TagSetOptions {
blocking: bool,
memory_backed: bool,
no_sched: bool,
+ hw_queue_depth: u32,
}
impl NullBlkDevice {
@@ -276,6 +284,7 @@ fn build_tag_set(options: TagSetOptions) -> Result<Arc<TagSet<Self>>> {
blocking,
memory_backed,
no_sched,
+ hw_queue_depth,
} = options;
if home_node > kernel::numa::num_online_nodes().try_into()? {
@@ -297,7 +306,7 @@ fn build_tag_set(options: TagSetOptions) -> Result<Arc<TagSet<Self>>> {
}
Arc::pin_init(
- TagSet::new(submit_queues, (), 256, 1, numa_node, flags),
+ TagSet::new(submit_queues, (), hw_queue_depth, 1, numa_node, flags),
GFP_KERNEL,
)
}
--
2.51.2
^ permalink raw reply related
* [PATCH v2 34/83] block: rust: add `hctx` private data support
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux, Andreas Hindborg
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
From: Andreas Hindborg <a.hindborg@samsung.com>
C block device drivers can attach private data to a hardware context
(`struct blk_mq_hw_ctx`). Add support for this feature for Rust block
device drivers via the `Operations::HwData` associated type.
The private data is created in the `init_hctx` callback and stored in
the `driver_data` field of `blk_mq_hw_ctx`. It is passed to `queue_rq`,
`commit_rqs`, and `poll` callbacks, and is released in `exit_hctx`.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
drivers/block/rnull/rnull.rs | 8 +++-
rust/kernel/block/mq.rs | 23 +++++++++-
rust/kernel/block/mq/operations.rs | 88 +++++++++++++++++++++++++++++++-------
3 files changed, 100 insertions(+), 19 deletions(-)
diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index ad26a4a8dbbe..0c1bc2f5ae9c 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -534,6 +534,7 @@ impl Operations for NullBlkDevice {
type QueueData = Pin<KBox<QueueData>>;
type RequestData = Pdu;
type TagSetData = ();
+ type HwData = ();
fn new_request_data() -> impl PinInit<Self::RequestData> {
pin_init!(Pdu {
@@ -544,6 +545,7 @@ fn new_request_data() -> impl PinInit<Self::RequestData> {
#[inline(always)]
fn queue_rq(
+ _hw_data: (),
queue_data: Pin<&QueueData>,
mut rq: Owned<mq::Request<Self>>,
_is_last: bool,
@@ -575,7 +577,11 @@ fn queue_rq(
Ok(())
}
- fn commit_rqs(_queue_data: Pin<&QueueData>) {}
+ fn commit_rqs(_hw_data: (), _queue_data: Pin<&QueueData>) {}
+
+ fn init_hctx(_tagset_data: (), _hctx_idx: u32) -> Result {
+ Ok(())
+ }
fn complete(rq: ARef<mq::Request<Self>>) {
Self::end_request(
diff --git a/rust/kernel/block/mq.rs b/rust/kernel/block/mq.rs
index 28cee0d60846..b095cc7f51ce 100644
--- a/rust/kernel/block/mq.rs
+++ b/rust/kernel/block/mq.rs
@@ -17,6 +17,12 @@
//! - The [`GenDisk`] type that abstracts the C type `struct gendisk`.
//! - The [`Request`] type that abstracts the C type `struct request`.
//!
+//! Many of the C types that this module abstracts allow a driver to carry
+//! private data, either embedded in the struct directly, or as a C `void*`. In
+//! these abstractions, this data is typed. The types of the data is defined by
+//! associated types in `Operations`, see [`Operations::RequestData`] for an
+//! example.
+//!
//! The kernel will interface with the block device driver by calling the method
//! implementations of the `Operations` trait.
//!
@@ -71,6 +77,7 @@
//! impl Operations for MyBlkDevice {
//! type RequestData = ();
//! type QueueData = ();
+//! type HwData = ();
//! type TagSetData = ();
//!
//! fn new_request_data(
@@ -78,12 +85,17 @@
//! Ok(())
//! }
//!
-//! fn queue_rq(_queue_data: (), rq: Owned<Request<Self>>, _is_last: bool) -> Result {
+//! fn queue_rq(
+//! _hw_data: (),
+//! _queue_data: (),
+//! rq: Owned<Request<Self>>,
+//! _is_last: bool
+//! ) -> Result {
//! rq.end_ok();
//! Ok(())
//! }
//!
-//! fn commit_rqs(_queue_data: ()) {}
+//! fn commit_rqs(_hw_data: (), _queue_data: ()) {}
//!
//! fn complete(rq: ARef<Request<Self>>) {
//! OwnableRefCounted::try_from_shared(rq)
@@ -91,6 +103,13 @@
//! .expect("Fatal error - expected to be able to end request")
//! .end_ok();
//! }
+//!
+//! fn init_hctx(
+//! _tagset_data: (),
+//! _hctx_idx: u32,
+//! ) -> Result<Self::HwData> {
+//! Ok(())
+//! }
//! }
//!
//! let tagset: Arc<TagSet<MyBlkDevice>> =
diff --git a/rust/kernel/block/mq/operations.rs b/rust/kernel/block/mq/operations.rs
index 093bb21fa1b2..1b20df25d6df 100644
--- a/rust/kernel/block/mq/operations.rs
+++ b/rust/kernel/block/mq/operations.rs
@@ -63,6 +63,13 @@ pub trait Operations: Sized {
/// the `GenDisk` associated with this `Operations` implementation.
type QueueData: ForeignOwnable + Sync;
+ /// Data associated with a dispatch queue. This is stored as a pointer in the C `struct
+ /// blk_mq_hw_ctx` that represents a hardware queue.
+ ///
+ /// Hardware contexts may be cleaned up by a thread different from the allocating thread, so
+ /// `HwData` must be `Send`.
+ type HwData: ForeignOwnable + Sync + Send;
+
/// Data associated with a `TagSet`. This is stored as a pointer in `struct
/// blk_mq_tag_set`.
type TagSetData: ForeignOwnable + Sync;
@@ -73,20 +80,30 @@ pub trait Operations: Sized {
/// Called by the kernel to queue a request with the driver. If `is_last` is
/// `false`, the driver is allowed to defer committing the request.
fn queue_rq(
+ hw_data: ForeignBorrowed<'_, Self::HwData>,
queue_data: ForeignBorrowed<'_, Self::QueueData>,
rq: Owned<Request<Self>>,
is_last: bool,
) -> Result;
/// Called by the kernel to indicate that queued requests should be submitted.
- fn commit_rqs(queue_data: ForeignBorrowed<'_, Self::QueueData>);
+ fn commit_rqs(
+ hw_data: ForeignBorrowed<'_, Self::HwData>,
+ queue_data: ForeignBorrowed<'_, Self::QueueData>,
+ );
+
+ /// Called by the kernel to allocate and initialize a driver specific hardware context data.
+ fn init_hctx(
+ tagset_data: ForeignBorrowed<'_, Self::TagSetData>,
+ hctx_idx: u32,
+ ) -> Result<Self::HwData>;
/// Called by the kernel when the request is completed.
fn complete(rq: ARef<Request<Self>>);
/// Called by the kernel to poll the device for completed requests. Only
/// used for poll queues.
- fn poll() -> bool {
+ fn poll(_hw_data: ForeignBorrowed<'_, Self::HwData>) -> bool {
build_error!(crate::error::VTABLE_DEFAULT_ERROR)
}
}
@@ -146,6 +163,11 @@ impl<T: Operations> OperationsVTable<T> {
let mut rq =
unsafe { Owned::from_raw(NonNull::<Request<T>>::new_unchecked((*bd).rq.cast())) };
+ // SAFETY: The safety requirement for this function ensure that `hctx`
+ // is valid and that `driver_data` was produced by a call to
+ // `into_foreign` in `Self::init_hctx_callback`.
+ let hw_data = unsafe { T::HwData::borrow((*hctx).driver_data) };
+
// SAFETY: `hctx` is valid as required by this function.
let queue_data = unsafe { (*(*hctx).queue).queuedata };
@@ -159,6 +181,7 @@ impl<T: Operations> OperationsVTable<T> {
unsafe { rq.start_unchecked() };
let ret = T::queue_rq(
+ hw_data,
queue_data,
rq,
// SAFETY: `bd` is valid as required by the safety requirement for
@@ -181,6 +204,10 @@ impl<T: Operations> OperationsVTable<T> {
/// This function may only be called by blk-mq C infrastructure. The caller
/// must ensure that `hctx` is valid.
unsafe extern "C" fn commit_rqs_callback(hctx: *mut bindings::blk_mq_hw_ctx) {
+ // SAFETY: `driver_data` was installed by us in `init_hctx_callback` as
+ // the result of a call to `into_foreign`.
+ let hw_data = unsafe { T::HwData::borrow((*hctx).driver_data) };
+
// SAFETY: `hctx` is valid as required by this function.
let queue_data = unsafe { (*(*hctx).queue).queuedata };
@@ -189,7 +216,7 @@ impl<T: Operations> OperationsVTable<T> {
// `ForeignOwnable::from_foreign()` is only called when the tagset is
// dropped, which happens after we are dropped.
let queue_data = unsafe { T::QueueData::borrow(queue_data) };
- T::commit_rqs(queue_data)
+ T::commit_rqs(hw_data, queue_data)
}
/// This function is called by the C kernel. A pointer to this function is
@@ -213,12 +240,18 @@ impl<T: Operations> OperationsVTable<T> {
///
/// # Safety
///
- /// This function may only be called by blk-mq C infrastructure.
+ /// This function may only be called by blk-mq C infrastructure. `hctx` must
+ /// be a pointer to a valid and aligned `struct blk_mq_hw_ctx` that was
+ /// previously initialized by a call to `init_hctx_callback`.
unsafe extern "C" fn poll_callback(
- _hctx: *mut bindings::blk_mq_hw_ctx,
+ hctx: *mut bindings::blk_mq_hw_ctx,
_iob: *mut bindings::io_comp_batch,
) -> crate::ffi::c_int {
- T::poll().into()
+ // SAFETY: By function safety requirement, `hctx` was initialized by
+ // `init_hctx_callback` and thus `driver_data` came from a call to
+ // `into_foreign`.
+ let hw_data = unsafe { T::HwData::borrow((*hctx).driver_data) };
+ T::poll(hw_data).into()
}
/// This function is called by the C kernel. A pointer to this function is
@@ -226,15 +259,29 @@ impl<T: Operations> OperationsVTable<T> {
///
/// # Safety
///
- /// This function may only be called by blk-mq C infrastructure. This
- /// function may only be called once before `exit_hctx_callback` is called
- /// for the same context.
+ /// This function may only be called by blk-mq C infrastructure.
+ /// `tagset_data` must be initialized by the initializer returned by
+ /// `TagSet::try_new` as part of tag set initialization. `hctx` must be a
+ /// pointer to a valid `blk_mq_hw_ctx` where the `driver_data` field was not
+ /// yet initialized. This function may only be called once before
+ /// `exit_hctx_callback` is called for the same context.
unsafe extern "C" fn init_hctx_callback(
- _hctx: *mut bindings::blk_mq_hw_ctx,
- _tagset_data: *mut crate::ffi::c_void,
- _hctx_idx: crate::ffi::c_uint,
- ) -> crate::ffi::c_int {
- from_result(|| Ok(0))
+ hctx: *mut bindings::blk_mq_hw_ctx,
+ tagset_data: *mut c_void,
+ hctx_idx: c_uint,
+ ) -> c_int {
+ from_result(|| {
+ // SAFETY: By the safety requirements of this function,
+ // `tagset_data` came from a call to `into_foreign` when the
+ // `TagSet` was initialized.
+ let tagset_data = unsafe { T::TagSetData::borrow(tagset_data) };
+ let data = T::init_hctx(tagset_data, hctx_idx)?;
+
+ // SAFETY: by the safety requirements of this function, `hctx` is
+ // valid for write
+ unsafe { (*hctx).driver_data = data.into_foreign().cast() };
+ Ok(0)
+ })
}
/// This function is called by the C kernel. A pointer to this function is
@@ -242,11 +289,20 @@ impl<T: Operations> OperationsVTable<T> {
///
/// # Safety
///
- /// This function may only be called by blk-mq C infrastructure.
+ /// This function may only be called by blk-mq C infrastructure. `hctx` must
+ /// be a valid pointer that was previously initialized by a call to
+ /// `init_hctx_callback`. This function may be called only once after
+ /// `init_hctx_callback` was called.
unsafe extern "C" fn exit_hctx_callback(
- _hctx: *mut bindings::blk_mq_hw_ctx,
+ hctx: *mut bindings::blk_mq_hw_ctx,
_hctx_idx: crate::ffi::c_uint,
) {
+ // SAFETY: By the safety requirements of this function, `hctx` is valid for read.
+ let ptr = unsafe { (*hctx).driver_data };
+
+ // SAFETY: By the safety requirements of this function, `ptr` came from
+ // a call to `into_foreign` in `init_hctx_callback`
+ unsafe { T::HwData::from_foreign(ptr) };
}
/// This function is called by the C kernel. A pointer to this function is
--
2.51.2
^ permalink raw reply related
* [PATCH v2 72/83] block: rust: add a debug assert for refcounts
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add a debug assertion in `ARef<Request>::dismiss` to verify that the
request refcount is at least two when an `ARef<Request>` exists. This
helps catch reference counting bugs during development.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/kernel/block/mq/request.rs | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request.rs
index 9c451583e75d..05b167dfc6c6 100644
--- a/rust/kernel/block/mq/request.rs
+++ b/rust/kernel/block/mq/request.rs
@@ -619,9 +619,20 @@ impl<T> RequestTimerHandle<T>
pub fn dismiss(mut self) {
let inner = core::ptr::from_mut(&mut self.inner);
+ debug_assert!(
+ self.inner
+ .wrapper_ref()
+ .refcount()
+ .as_atomic()
+ .load(ordering::Relaxed)
+ >= 2,
+ "Request refcount must be at least two when an ARef<Request> exist"
+ );
+
// SAFETY: `inner` is valid for reads and writes, is properly aligned and nonnull. We have
// exclusive access to `inner` and we do not access `inner` after this call.
unsafe { core::ptr::drop_in_place(inner) };
+
core::mem::forget(self);
}
}
--
2.51.2
^ permalink raw reply related
* [PATCH v2 60/83] block: rust: add request flags abstraction
From: Andreas Hindborg @ 2026-06-09 19:08 UTC (permalink / raw)
To: Liam R. Howlett, Alice Ryhl, Anna-Maria Behnsen, Benno Lossin,
Björn Roy Baron, Boqun Feng, Danilo Krummrich,
FUJITA Tomonori, Frederic Weisbecker, Gary Guo, Jens Axboe,
John Stultz, Lorenzo Stoakes, Lyude Paul, Miguel Ojeda,
Stephen Boyd, Thomas Gleixner, Trevor Gross, Liam R. Howlett,
Boqun Feng, Lorenzo Stoakes
Cc: Andreas Hindborg, linux-block, linux-kernel, linux-mm,
rust-for-linux
In-Reply-To: <20260609-rnull-v6-19-rc5-send-v2-0-82c7404542e2@kernel.org>
Add the `Flag` enum and `Flags` type as Rust abstractions for the C
`REQ_*` request flags. These flags modify how block I/O requests are
processed, including sync behavior, priority hints, and integrity
settings.
Also add a `flags()` method to `Request` to retrieve the flags for a
given request.
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
---
rust/bindings/bindings_helper.h | 21 ++++++++++++
rust/kernel/block/mq.rs | 2 ++
rust/kernel/block/mq/request.rs | 12 +++++++
rust/kernel/block/mq/request/flag.rs | 65 ++++++++++++++++++++++++++++++++++++
4 files changed, 100 insertions(+)
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 2a69c17bf271..7acda3ae9725 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -140,6 +140,27 @@ const blk_status_t RUST_CONST_HELPER_BLK_STS_OFFLINE = BLK_STS_OFFLINE;
const blk_status_t RUST_CONST_HELPER_BLK_STS_DURATION_LIMIT = BLK_STS_DURATION_LIMIT;
const blk_status_t RUST_CONST_HELPER_BLK_STS_INVAL = BLK_STS_INVAL;
const blk_features_t RUST_CONST_HELPER_BLK_FEAT_ZONED = BLK_FEAT_ZONED;
+const blk_opf_t RUST_CONST_HELPER_REQ_FAILFAST_DEV = REQ_FAILFAST_DEV;
+const blk_opf_t RUST_CONST_HELPER_REQ_FAILFAST_TRANSPORT = REQ_FAILFAST_TRANSPORT;
+const blk_opf_t RUST_CONST_HELPER_REQ_FAILFAST_DRIVER = REQ_FAILFAST_DRIVER;
+const blk_opf_t RUST_CONST_HELPER_REQ_SYNC = REQ_SYNC;
+const blk_opf_t RUST_CONST_HELPER_REQ_META = REQ_META;
+const blk_opf_t RUST_CONST_HELPER_REQ_PRIO = REQ_PRIO;
+const blk_opf_t RUST_CONST_HELPER_REQ_NOMERGE = REQ_NOMERGE;
+const blk_opf_t RUST_CONST_HELPER_REQ_IDLE = REQ_IDLE;
+const blk_opf_t RUST_CONST_HELPER_REQ_INTEGRITY = REQ_INTEGRITY;
+const blk_opf_t RUST_CONST_HELPER_REQ_FUA = REQ_FUA;
+const blk_opf_t RUST_CONST_HELPER_REQ_PREFLUSH = REQ_PREFLUSH;
+const blk_opf_t RUST_CONST_HELPER_REQ_RAHEAD = REQ_RAHEAD;
+const blk_opf_t RUST_CONST_HELPER_REQ_BACKGROUND = REQ_BACKGROUND;
+const blk_opf_t RUST_CONST_HELPER_REQ_NOWAIT = REQ_NOWAIT;
+const blk_opf_t RUST_CONST_HELPER_REQ_POLLED = REQ_POLLED;
+const blk_opf_t RUST_CONST_HELPER_REQ_ALLOC_CACHE = REQ_ALLOC_CACHE;
+const blk_opf_t RUST_CONST_HELPER_REQ_SWAP = REQ_SWAP;
+const blk_opf_t RUST_CONST_HELPER_REQ_DRV = REQ_DRV;
+const blk_opf_t RUST_CONST_HELPER_REQ_FS_PRIVATE = REQ_FS_PRIVATE;
+const blk_opf_t RUST_CONST_HELPER_REQ_ATOMIC = REQ_ATOMIC;
+const blk_opf_t RUST_CONST_HELPER_REQ_NOUNMAP = REQ_NOUNMAP;
const fop_flags_t RUST_CONST_HELPER_FOP_UNSIGNED_OFFSET = FOP_UNSIGNED_OFFSET;
const xa_mark_t RUST_CONST_HELPER_XA_PRESENT = XA_PRESENT;
diff --git a/rust/kernel/block/mq.rs b/rust/kernel/block/mq.rs
index 23bf95136bc1..9bad95d79230 100644
--- a/rust/kernel/block/mq.rs
+++ b/rust/kernel/block/mq.rs
@@ -137,6 +137,8 @@
};
pub use request::{
Command,
+ Flag as RequestFlag,
+ Flags as RequestFlags,
IdleRequest,
Request,
RequestTimerHandle, //
diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request.rs
index dbe657a80324..84f8b2c17f85 100644
--- a/rust/kernel/block/mq/request.rs
+++ b/rust/kernel/block/mq/request.rs
@@ -48,6 +48,12 @@
mod command;
pub use command::Command;
+mod flag;
+pub use flag::{
+ Flag,
+ Flags, //
+};
+
/// A [`Request`] that a driver has not yet begun to process.
///
/// A driver can convert an `IdleRequest` to a [`Request`] by calling [`IdleRequest::start`].
@@ -125,6 +131,12 @@ pub fn command(&self) -> Command {
unsafe { Command::from_raw(self.command_raw()) }
}
+ pub fn flags(&self) -> Flags {
+ // SAFETY: By C API contract and type invariant, `cmd_flags` is valid for read
+ let flags = unsafe { (*self.0.get()).cmd_flags & !((1 << bindings::REQ_OP_BITS) - 1) };
+ Flags::try_from(flags).expect("Request should have valid flags")
+ }
+
/// Get the target sector for the request.
#[inline(always)]
pub fn sector(&self) -> u64 {
diff --git a/rust/kernel/block/mq/request/flag.rs b/rust/kernel/block/mq/request/flag.rs
new file mode 100644
index 000000000000..01f249269803
--- /dev/null
+++ b/rust/kernel/block/mq/request/flag.rs
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0
+use crate::{
+ bindings,
+ impl_flags, //
+};
+
+impl_flags! {
+ /// A set of request flags.
+ ///
+ /// This type wraps the C `REQ_*` flags and allows combining multiple flags
+ /// together. These flags modify how a block I/O request is processed.
+ #[derive(Debug, Clone, Default, Copy, PartialEq, Eq)]
+ pub struct Flags(u32);
+
+ /// Individual request flags for block I/O operations.
+ ///
+ /// These flags correspond to the C `REQ_*` defines in `linux/blk_types.h`
+ /// and are used to modify the behavior of block I/O requests.
+ #[derive(Debug, Clone, Copy, PartialEq, Eq)]
+ pub enum Flag {
+ /// No driver retries on device errors.
+ FailfastDev = bindings::REQ_FAILFAST_DEV,
+ /// No driver retries on transport errors.
+ FailfastTransport = bindings::REQ_FAILFAST_TRANSPORT,
+ /// No driver retries on driver errors.
+ FailfastDriver = bindings::REQ_FAILFAST_DRIVER,
+ /// Request is synchronous (sync write or read).
+ Sync = bindings::REQ_SYNC,
+ /// Metadata I/O request.
+ Meta = bindings::REQ_META,
+ /// Boost priority in CFQ scheduler.
+ Priority = bindings::REQ_PRIO,
+ /// Don't merge this request with others.
+ NoMerge = bindings::REQ_NOMERGE,
+ /// Anticipate more I/O after this one.
+ Idle = bindings::REQ_IDLE,
+ /// I/O includes block integrity payload.
+ Integrity = bindings::REQ_INTEGRITY,
+ /// Forced unit access - data must be written to persistent storage
+ /// before command completion is signaled.
+ ForcedUnitAccess = bindings::REQ_FUA,
+ /// Request a cache flush before this operation.
+ Preflush = bindings::REQ_PREFLUSH,
+ /// Read ahead request, can fail anytime.
+ ReadAhead = bindings::REQ_RAHEAD,
+ /// Background I/O operation.
+ Background = bindings::REQ_BACKGROUND,
+ /// Don't wait if the request would block.
+ NoWait = bindings::REQ_NOWAIT,
+ /// Caller polls for completion using `bio_poll`.
+ Polled = bindings::REQ_POLLED,
+ /// Allocate I/O from cache if available.
+ AllocCache = bindings::REQ_ALLOC_CACHE,
+ /// Swap I/O operation.
+ Swap = bindings::REQ_SWAP,
+ /// Reserved for driver use.
+ Driver = bindings::REQ_DRV,
+ /// Reserved for file system (submitter) use.
+ FsPrivate = bindings::REQ_FS_PRIVATE,
+ /// Atomic write operation.
+ Atomic = bindings::REQ_ATOMIC,
+ /// Do not free blocks when zeroing (for write zeroes operations).
+ NoUnmap = bindings::REQ_NOUNMAP,
+ }
+}
--
2.51.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox