From: Andreas Hindborg <a.hindborg@kernel.org>
To: Alice Ryhl <aliceryhl@google.com>
Cc: "Boqun Feng" <boqun.feng@gmail.com>,
"Jens Axboe" <axboe@kernel.dk>, "Miguel Ojeda" <ojeda@kernel.org>,
"Gary Guo" <gary@garyguo.net>,
"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
"Benno Lossin" <lossin@kernel.org>,
"Trevor Gross" <tmgross@umich.edu>,
"Danilo Krummrich" <dakr@kernel.org>,
"FUJITA Tomonori" <fujita.tomonori@gmail.com>,
"Frederic Weisbecker" <frederic@kernel.org>,
"Lyude Paul" <lyude@redhat.com>,
"Thomas Gleixner" <tglx@kernel.org>,
"Anna-Maria Behnsen" <anna-maria@linutronix.de>,
"John Stultz" <jstultz@google.com>,
"Stephen Boyd" <sboyd@kernel.org>,
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
linux-block@vger.kernel.org, rust-for-linux@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 05/79] block: rust: change `queue_rq` request type to `Owned`
Date: Wed, 08 Apr 2026 13:54:35 +0200 [thread overview]
Message-ID: <87qzopwttw.fsf@kernel.org> (raw)
In-Reply-To: <CAH5fLggq-1pReNsRtgRfm0Y_PiQvUsXbMMkPkyO9kuhqzB3wrA@mail.gmail.com>
Alice Ryhl <aliceryhl@google.com> writes:
> On Mon, Mar 23, 2026 at 1:08 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>> Alice Ryhl <aliceryhl@google.com> writes:
>>
>> > On Mon, Feb 16, 2026 at 12:34:52AM +0100, Andreas Hindborg wrote:
>> >> Simplify the reference counting scheme for `Request` from 4 states to 3
>> >> states. This is achieved by coalescing the zero state between block layer
>> >> owned and uniquely owned by driver.
>> >>
>> >> Implement `Ownable` for `Request` and deliver `Request` to drivers as
>> >> `Owned<Request>`. In this process:
>> >>
>> >> - Move uniqueness assertions out of `rnull` as these are now guaranteed by
>> >> the `Owned` type.
>> >> - Move `start_unchecked`, `try_set_end` and `end_ok` from `Request` to
>> >> `Owned<Request>`, relying on type invariant for uniqueness.
>> >>
>> >> Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
>> >
>> > It would be a lot cleaner if we could implement HrTimerPointer for
>> > Owned<Request> and entirely get rid of the refcount in request so we
>> > don't need ARef<Request> at all.
>> >
>> > Is there a reason we *need* ARef here?
>>
>> There is. Real drivers will need to dma map the data buffers in
>> `Request` to a device. This requires taking a reference on the pages to
>> be mapped, which in turn requires taking a reference on the `Request`.
>>
>> We could split up the reference counts into multiple fields, but that
>> would be less efficient.
>
> So how exactly is the refcount used here? Can you elaborate?
I can try to be more clear.
`Request` objects are created when a driver initializes a device. A
driver initializes a number `Request` equal to the queue depth the
driver supports. That is, if a driver/device supports 16 in-flight
requests, the driver allocates and initializes 16 `Request` objects up
front.
When the kernel wants to issue IO to a block device, it finds an idle
`Request` object and sets it up for the IO operation. Then:
1. The block layer hands off the request to the driver. Ownership of the
request is transferred to the driver. At this point, the driver has a
unique reference to the request (`Owned<Request>`).
2. The driver DMA maps the pages of the request. We have to make sure
the request is not handed back to the block layer while the pages are
mapped. To this end, we take a refcount on the request. The reference
that the driver holds is no longer unique (`ARef<Request>`).
3. The driver instructs a device to carry out the request. The driver
releases its refcount on the request and the device takes a refcount.
4. When the device finishes processing the request, ownership is
transferred back to the driver. The device releases it's refcount and
the driver takes a refcount.
5. DMA mappings are torn down. The refcount associated with the DMA
mappings is released.
6. The driver transfers ownership of the request back to the block
layer. To do this, the request must be uniquely owned by the driver.
When the device is done processing a request and we have to transfer
ownership of the request back to the driver, we use an API function
called `tag_to_rq`. This function takes an integer tag and may return a
request reference. For this function to be safe, we have to be able to
assert that the integer tag passed to the function is naming a request
object that it is valid to obtain a reference to. It is not sound to
create references to requests that are not currently in flight. Thus, we
must be able to know this information. The current implementation relies
on the refcount to discover this information:
/// There are three states for a request that the Rust bindings care about:
///
/// - 0: The request is owned by C block layer or is uniquely referenced (by [`Owned<_>`]).
/// - 1: The request is owned by Rust abstractions but is not referenced.
/// - 2+: There is one or more [`ARef`] instances referencing the request.
So, we are using 1 refcount field to encode all the information we need.
Further, in the current implementation, for step 3, the device does not
actually take a refcount on the request. If a driver drops all
references to a request, the refcount lands on 1. We use this to
indicate that the request has been leaked and to know that `tag_to_rq`
is safe. In a situation where the request is DMA mapped, the refcount
would be 2 while the device is processing the request.
Here is a sequence diagram of the flow:
Block Layer Driver Device
─────────── ────── ──────
| | |
| | |
(1) | hand off request | |
|-------------------->| |
| | Owned<Request> |
| | refcount = 0 |
| | |
| | |
(2) | DMA map pages |
| |---------. |
| | into_shared() |
| | ARef<Request> |
| | refcount = 2 |
| | map_pages() |
| | refcount = 3 |
| | |
| | |
(3) | submit to device |
| |--------------------->|
| | (drop driver ref) |
| | refcount = 2 |
| | (DMA ref remains) |
| | |
| | |
| | .----------------. |
| | | Device does | |
| | | DMA to/from | |
| | | mapped pages | |
| | '----------------' |
| | |
| | |
(4) | completion IRQ |
| |<---------------------|
| | refcount = 2 |
| | tag_to_rq(tag) |
| | ARef<Request> |
| | refcount = 3 |
| | |
| | |
(5) | tear down DMA mappings |
| |---------. |
| | (drop DMA ref) |
| | refcount = 2 |
| | |
| | |
(6) | hand back request | |
|<--------------------| |
| | try_from_shared() |
| | cmpxchg(2 -> 0) |
| | Owned<Request> |
| | refcount = 0 |
| | end_ok() |
| | |
We might be able to get by without shared references (`ARef<Request>`)
and only use an owned reference (`Owned<Request>`) if we add additional
fields to the `Reqeuest` structure. We need to track if the request is
in a state where we can return an `Owned<Requst>` from `tag_to_rq` , and
we need to track if any of the pages of the request are mapped for DMA,
so that we can end the request.
I am not convinced that having the additional fields is worth it for
simplifying the reference counting scheme.
> With regards to the Owned series, I still think we should split it up
> so that the patches making ARef+Owned work like Arc/UniqueArc is
> separate follow-up series.
I'll take that into consideration when sending the next spin of that series.
Best regards,
Andreas Hindborg
next prev parent reply other threads:[~2026-04-08 11:54 UTC|newest]
Thread overview: 114+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-15 23:34 [PATCH 00/79] block: rnull: complete the rust null block driver Andreas Hindborg
2026-02-15 23:34 ` [PATCH 01/79] block: rnull: adopt new formatting guidelines Andreas Hindborg
2026-03-16 9:14 ` Alice Ryhl
2026-02-15 23:34 ` [PATCH 02/79] block: rnull: add module parameters Andreas Hindborg
2026-03-16 9:18 ` Alice Ryhl
2026-02-15 23:34 ` [PATCH 03/79] block: rnull: add macros to define configfs attributes Andreas Hindborg
2026-03-16 9:23 ` Alice Ryhl
2026-02-15 23:34 ` [PATCH 04/79] block: rust: fix generation of bindings to `BLK_STS_.*` Andreas Hindborg
2026-03-16 9:27 ` Alice Ryhl
2026-03-16 11:58 ` Alice Ryhl
2026-02-15 23:34 ` [PATCH 05/79] block: rust: change `queue_rq` request type to `Owned` Andreas Hindborg
2026-03-16 10:01 ` Alice Ryhl
2026-03-23 12:08 ` Andreas Hindborg
2026-03-24 13:27 ` Alice Ryhl
2026-04-08 11:54 ` Andreas Hindborg [this message]
2026-02-15 23:34 ` [PATCH 06/79] block: rust: add `Request` private data support Andreas Hindborg
2026-03-16 10:10 ` Alice Ryhl
2026-02-15 23:34 ` [PATCH 07/79] block: rust: allow `hrtimer::Timer` in `RequestData` Andreas Hindborg
2026-02-15 23:34 ` [PATCH 08/79] block: rnull: add timer completion mode Andreas Hindborg
2026-02-15 23:34 ` [PATCH 09/79] block: rust: introduce `kernel::block::bio` module Andreas Hindborg
2026-03-16 10:29 ` Alice Ryhl
2026-02-15 23:34 ` [PATCH 10/79] block: rust: add `command` getter to `Request` Andreas Hindborg
2026-03-16 10:30 ` Alice Ryhl
2026-02-15 23:34 ` [PATCH 11/79] block: rust: mq: use GFP_KERNEL from prelude Andreas Hindborg
2026-03-16 10:30 ` Alice Ryhl
2026-02-15 23:34 ` [PATCH 12/79] block: rust: add `TagSet` flags Andreas Hindborg
2026-03-16 10:34 ` Alice Ryhl
2026-02-15 23:35 ` [PATCH 13/79] block: rnull: add memory backing Andreas Hindborg
2026-03-16 10:38 ` Alice Ryhl
2026-02-15 23:35 ` [PATCH 14/79] block: rnull: add submit queue count config option Andreas Hindborg
2026-03-16 10:40 ` Alice Ryhl
2026-02-15 23:35 ` [PATCH 15/79] block: rnull: add `use_per_node_hctx` " Andreas Hindborg
2026-03-16 10:43 ` Alice Ryhl
2026-03-16 11:01 ` Gary Guo
2026-03-16 13:57 ` Andreas Hindborg
2026-03-16 14:02 ` Miguel Ojeda
2026-03-16 14:05 ` Alice Ryhl
2026-02-15 23:35 ` [PATCH 16/79] block: rust: allow specifying home node when constructing `TagSet` Andreas Hindborg
2026-03-16 10:46 ` Alice Ryhl
2026-02-15 23:35 ` [PATCH 17/79] block: rnull: allow specifying the home numa node Andreas Hindborg
2026-02-15 23:35 ` [PATCH 18/79] block: rust: add Request::sectors() method Andreas Hindborg
2026-03-16 10:48 ` Alice Ryhl
2026-02-15 23:35 ` [PATCH 19/79] block: rust: mq: add max_hw_discard_sectors support to GenDiskBuilder Andreas Hindborg
2026-03-16 10:50 ` Alice Ryhl
2026-02-15 23:35 ` [PATCH 20/79] block: rnull: add discard support Andreas Hindborg
2026-03-16 10:52 ` Alice Ryhl
2026-02-15 23:35 ` [PATCH 21/79] block: rust: add `NoDefaultScheduler` flag for `TagSet` Andreas Hindborg
2026-03-16 10:53 ` Alice Ryhl
2026-02-15 23:35 ` [PATCH 22/79] block: rnull: add no_sched module parameter and configfs attribute Andreas Hindborg
2026-03-16 10:54 ` Alice Ryhl
2026-02-15 23:35 ` [PATCH 23/79] block: rust: change sector type from usize to u64 Andreas Hindborg
2026-03-16 10:56 ` Alice Ryhl
2026-02-15 23:35 ` [PATCH 24/79] block: rust: add `BadBlocks` for bad block tracking Andreas Hindborg
2026-03-16 11:55 ` Alice Ryhl
2026-02-15 23:35 ` [PATCH 25/79] block: rust: mq: add Request::end() method for custom status codes Andreas Hindborg
2026-03-16 11:57 ` Alice Ryhl
2026-02-15 23:35 ` [PATCH 26/79] block: rnull: add badblocks support Andreas Hindborg
2026-03-16 12:02 ` Alice Ryhl
2026-02-15 23:35 ` [PATCH 27/79] block: rnull: add badblocks_once support Andreas Hindborg
2026-02-15 23:35 ` [PATCH 28/79] block: rnull: add partial I/O support for bad blocks Andreas Hindborg
2026-03-16 12:14 ` Alice Ryhl
2026-02-15 23:35 ` [PATCH 29/79] block: rust: add `TagSet` private data support Andreas Hindborg
2026-02-15 23:35 ` [PATCH 30/79] block: rust: add `hctx` " Andreas Hindborg
2026-02-15 23:35 ` [PATCH 31/79] block: rnull: add volatile cache emulation Andreas Hindborg
2026-02-15 23:35 ` [PATCH 32/79] block: rust: implement `Sync` for `GenDisk` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 33/79] block: rust: add a back reference feature to `GenDisk` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 34/79] block: rust: introduce an idle type state for `Request` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 35/79] block: rust: add a request queue abstraction Andreas Hindborg
2026-02-15 23:35 ` [PATCH 36/79] block: rust: add a method to get the request queue for a request Andreas Hindborg
2026-02-15 23:35 ` [PATCH 37/79] block: rust: introduce `kernel::block::error` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 38/79] block: rust: require `queue_rq` to return a `BlkResult` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 39/79] block: rust: add `GenDisk::queue_data` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 40/79] block: rnull: add bandwidth limiting Andreas Hindborg
2026-02-15 23:35 ` [PATCH 41/79] block: rnull: add blocking queue mode Andreas Hindborg
2026-02-15 23:35 ` [PATCH 42/79] block: rnull: add shared tags Andreas Hindborg
2026-02-15 23:35 ` [PATCH 43/79] block: rnull: add queue depth config option Andreas Hindborg
2026-02-15 23:35 ` [PATCH 44/79] block: rust: add an abstraction for `bindings::req_op` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 45/79] block: rust: add a method to set the target sector of a request Andreas Hindborg
2026-02-15 23:35 ` [PATCH 46/79] block: rust: move gendisk vtable construction to separate function Andreas Hindborg
2026-02-15 23:35 ` [PATCH 47/79] block: rust: add zoned block device support Andreas Hindborg
2026-02-15 23:35 ` [PATCH 48/79] block: rnull: add zoned storage support Andreas Hindborg
2026-02-15 23:35 ` [PATCH 49/79] block: rust: add `map_queues` support Andreas Hindborg
2026-02-15 23:35 ` [PATCH 50/79] block: rust: add an abstraction for `struct blk_mq_queue_map` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 51/79] block: rust: add polled completion support Andreas Hindborg
2026-02-15 23:35 ` [PATCH 52/79] block: rust: add accessors to `TagSet` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 53/79] block: rnull: add polled completion support Andreas Hindborg
2026-02-15 23:35 ` [PATCH 54/79] block: rnull: add REQ_OP_FLUSH support Andreas Hindborg
2026-02-15 23:35 ` [PATCH 55/79] block: rust: add request flags abstraction Andreas Hindborg
2026-02-15 23:35 ` [PATCH 56/79] block: rust: add abstraction for block queue feature flags Andreas Hindborg
2026-02-15 23:35 ` [PATCH 57/79] block: rust: allow setting write cache and FUA flags for `GenDisk` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 58/79] block: rust: add `Segment::copy_to_page_limit` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 59/79] block: rnull: add fua support Andreas Hindborg
2026-02-15 23:35 ` [PATCH 60/79] block: fix arg type in `blk_mq_update_nr_hw_queues` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 61/79] block: rust: add `GenDisk::tag_set` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 62/79] block: rust: add `TagSet::update_hw_queue_count` Andreas Hindborg
2026-02-16 23:59 ` Ken Kurematsu
2026-02-17 9:54 ` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 63/79] block: rnull: add an option to change the number of hardware queues Andreas Hindborg
2026-02-15 23:35 ` [PATCH 64/79] block: rust: add an abstraction for `struct rq_list` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 65/79] block: rust: add `queue_rqs` vtable hook Andreas Hindborg
2026-02-15 23:35 ` [PATCH 66/79] block: rnull: support queue_rqs Andreas Hindborg
2026-02-15 23:35 ` [PATCH 67/79] block: rust: remove the `is_poll` parameter from `queue_rq` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 68/79] block: rust: add a debug assert for refcounts Andreas Hindborg
2026-02-15 23:35 ` [PATCH 69/79] block: rust: add `TagSet::tag_to_rq` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 70/79] block: rust: add `Request::queue_index` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 71/79] block: rust: add `Request::requeue` Andreas Hindborg
2026-02-15 23:35 ` [PATCH 72/79] block: rust: add `request_timeout` hook Andreas Hindborg
2026-02-15 23:36 ` [PATCH 73/79] block: rnull: add fault injection support Andreas Hindborg
2026-02-15 23:36 ` [PATCH 74/79] block: rust: add max_sectors option to `GenDiskBuilder` Andreas Hindborg
2026-02-15 23:36 ` [PATCH 75/79] block: rnull: allow configuration of the maximum IO size Andreas Hindborg
2026-02-15 23:36 ` [PATCH 76/79] block: rust: add `virt_boundary_mask` option to `GenDiskBuilder` Andreas Hindborg
2026-02-15 23:36 ` [PATCH 77/79] block: rnull: add `virt_boundary` option Andreas Hindborg
2026-02-15 23:36 ` [PATCH 78/79] block: rnull: add `shared_tag_bitmap` config option Andreas Hindborg
2026-02-15 23:36 ` [PATCH 79/79] block: rnull: add zone offline and readonly configfs files Andreas Hindborg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87qzopwttw.fsf@kernel.org \
--to=a.hindborg@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=aliceryhl@google.com \
--cc=anna-maria@linutronix.de \
--cc=axboe@kernel.dk \
--cc=bjorn3_gh@protonmail.com \
--cc=boqun.feng@gmail.com \
--cc=dakr@kernel.org \
--cc=frederic@kernel.org \
--cc=fujita.tomonori@gmail.com \
--cc=gary@garyguo.net \
--cc=jstultz@google.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=lossin@kernel.org \
--cc=lyude@redhat.com \
--cc=ojeda@kernel.org \
--cc=rust-for-linux@vger.kernel.org \
--cc=sboyd@kernel.org \
--cc=tglx@kernel.org \
--cc=tmgross@umich.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox