rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benno Lossin <benno.lossin@proton.me>
To: Andreas Hindborg <nmi@metaspace.dk>
Cc: "Jens Axboe" <axboe@kernel.dk>, "Christoph Hellwig" <hch@lst.de>,
	"Keith Busch" <kbusch@kernel.org>,
	"Damien Le Moal" <dlemoal@kernel.org>,
	"Bart Van Assche" <bvanassche@acm.org>,
	"Hannes Reinecke" <hare@suse.de>,
	"Ming Lei" <ming.lei@redhat.com>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"Andreas Hindborg" <a.hindborg@samsung.com>,
	"Wedson Almeida Filho" <wedsonaf@gmail.com>,
	"Greg KH" <gregkh@linuxfoundation.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	"Alex Gaynor" <alex.gaynor@gmail.com>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"Chaitanya Kulkarni" <chaitanyak@nvidia.com>,
	"Luis Chamberlain" <mcgrof@kernel.org>,
	"Yexuan Yang" <1182282462@bupt.edu.cn>,
	"Sergio González Collado" <sergio.collado@gmail.com>,
	"Joel Granados" <j.granados@samsung.com>,
	"Pankaj Raghav (Samsung)" <kernel@pankajraghav.com>,
	"Daniel Gomez" <da.gomez@samsung.com>,
	"Niklas Cassel" <Niklas.Cassel@wdc.com>,
	"Philipp Stanner" <pstanner@redhat.com>,
	"Conor Dooley" <conor@kernel.org>,
	"Johannes Thumshirn" <Johannes.Thumshirn@wdc.com>,
	"Matias Bjørling" <m@bjorling.me>,
	"open list" <linux-kernel@vger.kernel.org>,
	"rust-for-linux@vger.kernel.org" <rust-for-linux@vger.kernel.org>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"gost.dev@samsung.com" <gost.dev@samsung.com>
Subject: Re: [PATCH v4 1/3] rust: block: introduce `kernel::block::mq` module
Date: Mon, 03 Jun 2024 18:26:08 +0000	[thread overview]
Message-ID: <925fe0fe-9303-4f49-b473-c3a3ecc5e2e6@proton.me> (raw)
In-Reply-To: <87mso2me0p.fsf@metaspace.dk>

On 03.06.24 14:01, Andreas Hindborg wrote:
> Benno Lossin <benno.lossin@proton.me> writes:
>> On 01.06.24 15:40, Andreas Hindborg wrote:
>>> +impl seal::Sealed for Initialized {}
>>> +impl GenDiskState for Initialized {
>>> +    const DELETE_ON_DROP: bool = false;
>>> +}
>>> +impl seal::Sealed for Added {}
>>> +impl GenDiskState for Added {
>>> +    const DELETE_ON_DROP: bool = true;
>>> +}
>>> +
>>> +impl<T: Operations> GenDisk<T, Initialized> {
>>> +    /// Try to create a new `GenDisk`.
>>> +    pub fn try_new(tagset: Arc<TagSet<T>>) -> Result<Self> {
>>
>> Since there is no non-try `new` function, I think we should name this
>> function just `new`.
> 
> Right, I am still getting used to the new naming scheme. Do you know if
> it is documented anywhere?

I don't think it is documented, it might only be a verbal convention at
the moment. Although [1] is suggesting `new` for general constructors.
Since this is the only constructor, one could argue that the
recommendation is to use `new` (which I personally find a good idea).

[1]: https://rust-lang.github.io/api-guidelines/naming.html

[...]

>>> +impl<T: Operations> OperationsVTable<T> {
>>> +    /// This function is called by the C kernel. A pointer to this function is
>>> +    /// installed in the `blk_mq_ops` vtable for the driver.
>>> +    ///
>>> +    /// # Safety
>>> +    ///
>>> +    /// - The caller of this function must ensure `bd` is valid
>>> +    ///   and initialized. The pointees must outlive this function.
>>
>> Until when do the pointees have to be alive? "must outlive this
>> function" could also be the case if the pointees die immediately after
>> this function returns.
> 
> It should not be plural. What I intended to communicate is that what
> `bd` points to must be valid for read for the duration of the function
> call. I think that is what "The pointee must outlive this function"
> states? Although when we talk about lifetime of an object pointed to by
> a pointer, I am not sure about the correct way to word this. Do we talk
> about the lifetime of the pointer or the lifetime of the pointed to
> object (the pointee). We should not use the same wording for the pointer
> and the pointee.
> 
> How about:
> 
>     /// - The caller of this function must ensure that the pointee of `bd` is
>     ///   valid for read for the duration of this function.

But this is not enough for it to be sound, right? You create an `ARef`
from `bd.rq`, which potentially lives forever. You somehow need to
require that the pointer `bd` stays valid for reads and (synchronized)
writes until the request is ended (probably via `blk_mq_end_request`).

>>> +    /// - This function must not be called with a `hctx` for which
>>> +    ///   `Self::exit_hctx_callback()` has been called.
>>> +    /// - (*bd).rq must point to a valid `bindings:request` for which
>>> +    ///   `OperationsVTable<T>::init_request_callback` was called
>>
>> Missing `.` at the end.
> 
> Thanks.
> 
>>
>>> +    unsafe extern "C" fn queue_rq_callback(
>>> +        _hctx: *mut bindings::blk_mq_hw_ctx,
>>> +        bd: *const bindings::blk_mq_queue_data,
>>> +    ) -> bindings::blk_status_t {
>>> +        // SAFETY: `bd.rq` is valid as required by the safety requirement for
>>> +        // this function.
>>> +        let request = unsafe { &*(*bd).rq.cast::<Request<T>>() };
>>> +
>>> +        // One refcount for the ARef, one for being in flight
>>> +        request.wrapper_ref().refcount().store(2, Ordering::Relaxed);
>>> +
>>> +        // SAFETY: We own a refcount that we took above. We pass that to `ARef`.
>>> +        // By the safety requirements of this function, `request` is a valid
>>> +        // `struct request` and the private data is properly initialized.
>>> +        let rq = unsafe { Request::aref_from_raw((*bd).rq) };
>>
>> I think that you need to require that the request is alive at least
>> until `blk_mq_end_request` is called for the request (since at that
>> point all `ARef`s will be gone).
>> Also if this is not guaranteed, the safety requirements of
>> `AlwaysRefCounted` are violated (since the object can just disappear
>> even if it has refcount > 0 [the refcount refers to the Rust refcount in
>> the `RequestDataWrapper`, not the one in C]).
> 
> Yea, for the last invariant of `Request`:
> 
>   /// * `self` is reference counted by atomic modification of
>   ///   self.wrapper_ref().refcount().
> 
> I will add this to the safety comment at the call site:
> 
>   //  - `rq` will be alive until `blk_mq_end_request` is called and is
>   //    reference counted by `ARef` until then.

Seems like you already want to use this here :)

[...]

>>> +    /// This function is called by the C kernel. A pointer to this function is
>>> +    /// installed in the `blk_mq_ops` vtable for the driver.
>>> +    ///
>>> +    /// # Safety
>>> +    ///
>>> +    /// This function may only be called by blk-mq C infrastructure. `set` must

`set` doesn't exist (`_set` does), you are also not using this
requirement.

>>> +    /// point to an initialized `TagSet<T>`.
>>> +    unsafe extern "C" fn init_request_callback(
>>> +        _set: *mut bindings::blk_mq_tag_set,
>>> +        rq: *mut bindings::request,
>>> +        _hctx_idx: core::ffi::c_uint,
>>> +        _numa_node: core::ffi::c_uint,
>>> +    ) -> core::ffi::c_int {
>>> +        from_result(|| {
>>> +            // SAFETY: The `blk_mq_tag_set` invariants guarantee that all
>>> +            // requests are allocated with extra memory for the request data.
>>
>> What guarantees that the right amount of memory has been allocated?
>> AFAIU that is guaranteed by the `TagSet` (but there is no invariant).
> 
> It is by C API contract. `TagSet`::try_new` (now `new`) writes
> `cmd_size` into the `struct blk_mq_tag_set`. That is picked up by
> `blk_mq_alloc_tag_set` to allocate the right amount of space for each request.
> 
> The invariant here is on the C type. Perhaps the wording is wrong. I am
> not exactly sure how to express this. How about this:
> 
>             // SAFETY: We instructed `blk_mq_alloc_tag_set` to allocate requests
>             // with extra memory for the request data when we called it in
>             // `TagSet::new`.

I think you need a safety requirement on the function: `rq` points to a
valid `Request`. Then you could just use `Request::wrapper_ptr` instead
of the line below.

>>> +            let pdu = unsafe { bindings::blk_mq_rq_to_pdu(rq) }.cast::<RequestDataWrapper>();
>>> +
>>> +            // SAFETY: The refcount field is allocated but not initialized, this
>>> +            // valid for write.
>>> +            unsafe { RequestDataWrapper::refcount_ptr(pdu).write(AtomicU64::new(0)) };
>>> +
>>> +            Ok(0)
>>> +        })
>>> +    }
>>
>> [...]
>>
>>> +    /// Notify the block layer that a request is going to be processed now.
>>> +    ///
>>> +    /// The block layer uses this hook to do proper initializations such as
>>> +    /// starting the timeout timer. It is a requirement that block device
>>> +    /// drivers call this function when starting to process a request.
>>> +    ///
>>> +    /// # Safety
>>> +    ///
>>> +    /// The caller must have exclusive ownership of `self`, that is
>>> +    /// `self.wrapper_ref().refcount() == 2`.
>>> +    pub(crate) unsafe fn start_unchecked(this: &ARef<Self>) {
>>> +        // SAFETY: By type invariant, `self.0` is a valid `struct request`. By
>>> +        // existence of `&mut self` we have exclusive access.
>>
>> We don't have a `&mut self`. But the safety requirements ask for a
>> unique `ARef`.
> 
> Thanks, I'll rephrase to:
> 
>         // SAFETY: By type invariant, `self.0` is a valid `struct request` and
>         // we have exclusive access.
> 
>>
>>> +        unsafe { bindings::blk_mq_start_request(this.0.get()) };
>>> +    }
>>> +
>>> +    fn try_set_end(this: ARef<Self>) -> Result<ARef<Self>, ARef<Self>> {
>>> +        // We can race with `TagSet::tag_to_rq`
>>> +        match this.wrapper_ref().refcount().compare_exchange(
>>> +            2,
>>> +            0,
>>> +            Ordering::Relaxed,
>>> +            Ordering::Relaxed,
>>> +        ) {
>>> +            Err(_old) => Err(this),
>>> +            Ok(_) => Ok(this),
>>> +        }
>>> +    }
>>> +
>>> +    /// Notify the block layer that the request has been completed without errors.
>>> +    ///
>>> +    /// This function will return `Err` if `this` is not the only `ARef`
>>> +    /// referencing the request.
>>> +    pub fn end_ok(this: ARef<Self>) -> Result<(), ARef<Self>> {
>>> +        let this = Self::try_set_end(this)?;
>>> +        let request_ptr = this.0.get();
>>> +        core::mem::forget(this);
>>> +
>>> +        // SAFETY: By type invariant, `self.0` is a valid `struct request`. By
>>> +        // existence of `&mut self` we have exclusive access.
>>
>> Same here, but in this case, the `ARef` is unique, since you called
>> `try_set_end`. You could make it a `# Guarantee` of `try_set_end`: "If
>> `Ok(aref)` is returned, then the `aref` is unique."
> 
> Makes sense. I have not seen `# Guarantee` used anywhere. Do you have a link for that use?

Alice used it a couple of times, eg in [2]. I plan on putting it in the
safety standard.

[2]: https://lore.kernel.org/rust-for-linux/20230601134946.3887870-2-aliceryhl@google.com/

---
Cheers,
Benno


  reply	other threads:[~2024-06-03 18:26 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-01 13:40 [PATCH v4 0/3] Rust block device driver API and null block driver Andreas Hindborg
2024-06-01 13:40 ` [PATCH v4 1/3] rust: block: introduce `kernel::block::mq` module Andreas Hindborg
2024-06-02 20:08   ` Benno Lossin
2024-06-03 12:01     ` Andreas Hindborg
2024-06-03 18:26       ` Benno Lossin [this message]
2024-06-04  9:59         ` Andreas Hindborg
2024-06-10 20:07           ` Benno Lossin
2024-06-01 13:40 ` [PATCH v4 2/3] rust: block: add rnull, Rust null_blk implementation Andreas Hindborg
2024-06-01 14:24   ` Keith Busch
2024-06-01 15:36     ` Andreas Hindborg
2024-06-01 16:01       ` Keith Busch
2024-06-01 16:59         ` Andreas Hindborg
2024-06-01 19:53           ` Andreas Hindborg
2024-06-02  3:49         ` Matthew Wilcox
2024-06-02  9:27           ` Andreas Hindborg
2024-06-03  9:05         ` Hannes Reinecke
2024-06-03  9:06           ` Alice Ryhl
2024-06-03 12:05             ` Andreas Hindborg
2024-06-03 12:07           ` Andreas Hindborg
2024-06-01 13:40 ` [PATCH v4 3/3] MAINTAINERS: add entry for Rust block device driver API Andreas Hindborg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=925fe0fe-9303-4f49-b473-c3a3ecc5e2e6@proton.me \
    --to=benno.lossin@proton.me \
    --cc=1182282462@bupt.edu.cn \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=Niklas.Cassel@wdc.com \
    --cc=a.hindborg@samsung.com \
    --cc=alex.gaynor@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=axboe@kernel.dk \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=bvanassche@acm.org \
    --cc=chaitanyak@nvidia.com \
    --cc=conor@kernel.org \
    --cc=da.gomez@samsung.com \
    --cc=dlemoal@kernel.org \
    --cc=gary@garyguo.net \
    --cc=gost.dev@samsung.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=j.granados@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=kernel@pankajraghav.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=m@bjorling.me \
    --cc=mcgrof@kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=nmi@metaspace.dk \
    --cc=ojeda@kernel.org \
    --cc=pstanner@redhat.com \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=sergio.collado@gmail.com \
    --cc=wedsonaf@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).