Linux block layer
 help / color / mirror / Atom feed
* Re: [PATCH v2 4/7] rust: drm: set fops.owner from driver module pointer
From: Gary Guo @ 2026-06-18 14:15 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block
In-Reply-To: <20260521-fix-fops-owner-v2-4-fd99079c5a04@linux.dev>

On Thu May 21, 2026 at 8:52 AM BST, Alvin Sun wrote:
> Change `create_fops()` to accept an owner module pointer instead of
> hardcoding `null_mut()`, ensuring the kernel correctly tracks the
> module owning the DRM device's file operations.
>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
> ---
>  rust/kernel/drm/device.rs  | 3 ++-
>  rust/kernel/drm/gem/mod.rs | 4 ++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/rust/kernel/drm/device.rs b/rust/kernel/drm/device.rs
> index 403fc35353c74..53e44a780ae97 100644
> --- a/rust/kernel/drm/device.rs
> +++ b/rust/kernel/drm/device.rs
> @@ -111,7 +111,8 @@ impl<T: drm::Driver> Device<T> {
>          fops: &Self::GEM_FOPS,
>      };
>  
> -    const GEM_FOPS: bindings::file_operations = drm::gem::create_fops();
> +    const GEM_FOPS: bindings::file_operations =
> +        drm::gem::create_fops(<T::ThisModule as crate::ModuleMetadata>::THIS_MODULE.as_ptr());

I wonder if the assoc type should just be called `Owner` or `OwnerModule`?

Best.
Gary

>  
>      /// Create a new `drm::Device` for a `drm::Driver`.
>      pub fn new(dev: &device::Device, data: impl PinInit<T::Data, Error>) -> Result<ARef<Self>> {
> diff --git a/rust/kernel/drm/gem/mod.rs b/rust/kernel/drm/gem/mod.rs
> index 01b5bd47a3332..9a203efc59116 100644
> --- a/rust/kernel/drm/gem/mod.rs
> +++ b/rust/kernel/drm/gem/mod.rs
> @@ -357,10 +357,10 @@ impl<T: DriverObject> AllocImpl for Object<T> {
>      };
>  }
>  
> -pub(super) const fn create_fops() -> bindings::file_operations {
> +pub(super) const fn create_fops(owner: *mut bindings::module) -> bindings::file_operations {
>      let mut fops: bindings::file_operations = pin_init::zeroed();
>  
> -    fops.owner = core::ptr::null_mut();
> +    fops.owner = owner;
>      fops.open = Some(bindings::drm_open);
>      fops.release = Some(bindings::drm_release);
>      fops.unlocked_ioctl = Some(bindings::drm_ioctl);



^ permalink raw reply

* Re: [PATCH v2 2/7] rust: macros: auto-insert ThisModule in #[vtable]
From: Gary Guo @ 2026-06-18 14:13 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block
In-Reply-To: <20260521-fix-fops-owner-v2-2-fd99079c5a04@linux.dev>

On Thu May 21, 2026 at 8:52 AM BST, Alvin Sun wrote:
> Auto-add `type ThisModule: ::kernel::ModuleMetadata;` as a required
> associated type on the trait side if not already defined, and
> auto-insert `type ThisModule = crate::LocalModule;` on the impl side
> if not explicitly provided, eliminating the need to manually declare
> and implement `ThisModule` in every vtable trait and impl.
>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>

Suggested-by: Gary Guo <gary@garyguo.net>
Link: https://lore.kernel.org/all/DIMMWHUOLPSH.13JFRHDKDQJGO@garyguo.net

> ---
>  rust/macros/lib.rs    |  6 ++++++
>  rust/macros/vtable.rs | 38 +++++++++++++++++++++++++++++++++++++-
>  2 files changed, 43 insertions(+), 1 deletion(-)
>
> diff --git a/rust/macros/lib.rs b/rust/macros/lib.rs
> index 2cfd59e0f9e7c..d35e45ea745c0 100644
> --- a/rust/macros/lib.rs
> +++ b/rust/macros/lib.rs
> @@ -176,6 +176,12 @@ pub fn module(input: TokenStream) -> TokenStream {
>  ///
>  /// This macro should not be used when all functions are required.
>  ///
> +/// Additionally, this macro automatically handles the `ThisModule`
> +/// associated type: on the trait side, `type ThisModule: ModuleMetadata;`
> +/// is added as a required associated type if not already defined; on the
> +/// impl side, `type ThisModule = LocalModule;` is automatically inserted
> +/// if not explicitly defined.
> +///
>  /// # Examples
>  ///
>  /// ```
> diff --git a/rust/macros/vtable.rs b/rust/macros/vtable.rs
> index c6510b0c4ea1d..d3d0e9cbd7172 100644
> --- a/rust/macros/vtable.rs
> +++ b/rust/macros/vtable.rs
> @@ -23,6 +23,7 @@
>  
>  fn handle_trait(mut item: ItemTrait) -> Result<ItemTrait> {
>      let mut gen_items = Vec::new();
> +    let mut has_this_module = false;
>  
>      gen_items.push(parse_quote! {
>           /// A marker to prevent implementors from forgetting to use [`#[vtable]`](vtable)
> @@ -30,6 +31,28 @@ fn handle_trait(mut item: ItemTrait) -> Result<ItemTrait> {
>           const USE_VTABLE_ATTR: ();
>      });
>  
> +    // Detect existing type ThisModule so we don't add a duplicate.
> +    for i in &item.items {
> +        if let TraitItem::Type(type_item) = i {
> +            if type_item.ident == "ThisModule" {
> +                has_this_module = true;
> +            }
> +        }
> +    }
> +
> +    // Add `type ThisModule: ModuleMetadata` as a required associated type if
> +    // the trait does not already define it. No default is used because
> +    // `associated_type_defaults` is unstable (issue #29661).

I don't think this is relevant. What's the sensible default anyway?

> +    if !has_this_module {

Perhaps just make this an one liner :

    if !item.items.iter().any(|i| matches!(item, TraitItem::Type(t) if t.ident == "ThisModule")) {

> +        gen_items.push(parse_quote! {
> +            /// The module implementing this vtable trait.
> +            ///
> +            /// Automatically set to `crate::LocalModule` by the `#[vtable]`
> +            /// impl macro.
> +            type ThisModule: ::kernel::ModuleMetadata;
> +        });
> +    }
> +
>      for item in &item.items {
>          if let TraitItem::Fn(fn_item) = item {
>              let name = &fn_item.sig.ident;
> @@ -58,18 +81,31 @@ fn handle_trait(mut item: ItemTrait) -> Result<ItemTrait> {
>  fn handle_impl(mut item: ItemImpl) -> Result<ItemImpl> {
>      let mut gen_items = Vec::new();
>      let mut defined_consts = HashSet::new();
> +    let mut defined_types = HashSet::new();

I'd just rename `defined_consts` to `defined_items` to reuse the same set as
there cannot be assoc items with same name anyway.

Best,
Gary

>  
> -    // Iterate over all user-defined constants to gather any possible explicit overrides.
> +    // Iterate over all user-defined constants and types to gather any possible explicit overrides.
>      for item in &item.items {
>          if let ImplItem::Const(const_item) = item {
>              defined_consts.insert(const_item.ident.clone());
>          }
> +        if let ImplItem::Type(type_item) = item {
> +            defined_types.insert(type_item.ident.clone());
> +        }
>      }
>  
>      gen_items.push(parse_quote! {
>          const USE_VTABLE_ATTR: () = ();
>      });
>  
> +    // Auto-insert `type ThisModule = crate::LocalModule` if not explicitly defined.
> +    // `crate::LocalModule` resolves to the real module type (via `module!`) or a
> +    // dummy fallback in non-module contexts (e.g., doctests).
> +    if !defined_types.contains(&parse_quote!(ThisModule)) {
> +        gen_items.push(parse_quote! {
> +            type ThisModule = crate::LocalModule;
> +        });
> +    }
> +
>      for item in &item.items {
>          if let ImplItem::Fn(fn_item) = item {
>              let name = &fn_item.sig.ident;



^ permalink raw reply

* Re: [PATCH] block: remove redundant GD_NEED_PART_SCAN in add_disk_final()
From: Christoph Hellwig @ 2026-06-18 14:07 UTC (permalink / raw)
  To: Connor Williamson
  Cc: axboe, linux-block, linux-kernel, stable, yukuai3, hch, jack,
	nh-open-source
In-Reply-To: <20260615130715.53693-1-connordw@amazon.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply

* Re: [PATCH 1/1] block: validate user space vectors during extraction
From: Keith Busch @ 2026-06-18 13:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-block, linux-fsdevel, dm-devel, axboe, brauner,
	djwong, viro, stable
In-Reply-To: <20260618134346.GA2752@lst.de>

On Thu, Jun 18, 2026 at 03:43:46PM +0200, Christoph Hellwig wrote:
> On Thu, Jun 18, 2026 at 07:17:35AM -0600, Keith Busch wrote:
> > > >  	if (iov_iter_is_bvec(iter)) {
> > > >  		bio_iov_bvec_set(bio, iter);
> > > > +
> > > > +		if (mp_bvec_iter_offset(bio->bi_io_vec, bio->bi_iter) &
> > > > +							vec_align_mask)
> > > > +			return -EINVAL;
> > > 
> > > Can you add a comment here?  Especially as the bvec iter doesn't actually
> > > require all individual bvecs to be aligned and I'm not entirely sure this
> > > handles all case - writing down the rules might help a bit with that.
> > 
> > The rationale is that the only iter_bvec users come from io_uring
> > registered buffers, which are virtually contiguous.
> 
> There's plenty of iov_iter_bdev users, and even without poking deep I
> know that two directly passed on bvecs from block-layer generated bios to
> the underlying file system's direct I/O code: loop and zloop.

Oh, I meant only users that go through this direct-io path, but you're
right, I was wrong about that too. The nvme target file backend can also
get here in addition to what you pointed out.
 
> So we need rules on what can be passed, and preferably some way to
> enforce that at least for debug builds.

Yeah.

^ permalink raw reply

* Re: [PATCH 1/1] block: validate user space vectors during extraction
From: Christoph Hellwig @ 2026-06-18 13:43 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Keith Busch, linux-block, linux-fsdevel,
	dm-devel, axboe, brauner, djwong, viro, stable
In-Reply-To: <ajPv7yOoYsR5O6kf@kbusch-mbp>

On Thu, Jun 18, 2026 at 07:17:35AM -0600, Keith Busch wrote:
> > >  	if (iov_iter_is_bvec(iter)) {
> > >  		bio_iov_bvec_set(bio, iter);
> > > +
> > > +		if (mp_bvec_iter_offset(bio->bi_io_vec, bio->bi_iter) &
> > > +							vec_align_mask)
> > > +			return -EINVAL;
> > 
> > Can you add a comment here?  Especially as the bvec iter doesn't actually
> > require all individual bvecs to be aligned and I'm not entirely sure this
> > handles all case - writing down the rules might help a bit with that.
> 
> The rationale is that the only iter_bvec users come from io_uring
> registered buffers, which are virtually contiguous.

There's plenty of iov_iter_bdev users, and even without poking deep I
know that two directly passed on bvecs from block-layer generated bios to
the underlying file system's direct I/O code: loop and zloop.

So we need rules on what can be passed, and preferably some way to
enforce that at least for debug builds.


^ permalink raw reply

* Re: [PATCH 1/1] block: validate user space vectors during extraction
From: Keith Busch @ 2026-06-18 13:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-block, linux-fsdevel, dm-devel, axboe, brauner,
	djwong, viro, stable
In-Reply-To: <20260618102627.GA23200@lst.de>

On Thu, Jun 18, 2026 at 12:26:27PM +0200, Christoph Hellwig wrote:
> On Wed, Jun 17, 2026 at 04:32:35PM -0700, Keith Busch wrote:
> > @@ -1251,6 +1251,11 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter,
> >  
> >  	if (iov_iter_is_bvec(iter)) {
> >  		bio_iov_bvec_set(bio, iter);
> > +
> > +		if (mp_bvec_iter_offset(bio->bi_io_vec, bio->bi_iter) &
> > +							vec_align_mask)
> > +			return -EINVAL;
> 
> Can you add a comment here?  Especially as the bvec iter doesn't actually
> require all individual bvecs to be aligned and I'm not entirely sure this
> handles all case - writing down the rules might help a bit with that.

The rationale is that the only iter_bvec users come from io_uring
registered buffers, which are virtually contiguous. Subsequent IO
referencing it provides only an offset and a length, so the only
possible unlaignment could bne the first offset (we've already verified
the total length earlier). Every subsequent vector must be page aligned
at a minimum, which is the largest possible dma alignment the block
layer allows, so we don't need to check the rest.
 
> >  		ret = iov_iter_extract_bvecs(iter, bio->bi_io_vec,
> >  				BIO_MAX_SIZE - bio->bi_iter.bi_size,
> > -				&bio->bi_vcnt, bio->bi_max_vecs, flags);
> > +				&bio->bi_vcnt, bio->bi_max_vecs,
> > +				vec_align_mask, flags);
> >  		if (ret <= 0) {
> > +			if (ret == -EINVAL) {
> > +				bio_release_pages(bio, false);
> > +				bio_clear_flag(bio, BIO_PAGE_PINNED);
> > +				bio->bi_iter.bi_size = 0;
> > +				bio->bi_vcnt = 0;
> > +				return ret;
> > +			}
> 
> Do we need all this cleanups beyoned the bio_release_pages()?  Most
> callers just free the bio, so should not care about it, and the error
> handling in __blkdev_direct_IO that calls bio_endio looks buggy for
> other reasons..

Yeah, it's exactly for the __blkdev_direct_IO() error handling, though I
think clearing either the PINNED flag or bi_vcnt is sufficient after
bio_release_pages(). The rest is just resetting the bio to the initial
state since I didn't want to return both an error and something that
looks like a partially constructed bio, even if no one currently cares.

But since you mention it, __blkdev_direct_IO's handling does look wrong,
so maybe I can clean that up first.

^ permalink raw reply

* Re: [PATCH v2 3/5] btrfs: deny freezing a device while it is being removed
From: Johannes Thumshirn @ 2026-06-18 12:56 UTC (permalink / raw)
  To: Christian Brauner, Chris Mason, Jens Axboe, David Sterba,
	Jan Kara
  Cc: Naohiro Aota, Josef Bacik, linux-btrfs, linux-block,
	linux-fsdevel
In-Reply-To: <20260616-work-super-freeze_deny_upstream-v2-3-b3567c7f994b@kernel.org>

Looks good,

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>


^ permalink raw reply

* Re: [PATCH v2 1/5] block: allow making a block device unfreezable
From: Johannes Thumshirn @ 2026-06-18 12:47 UTC (permalink / raw)
  To: Christian Brauner, Chris Mason, Jens Axboe, David Sterba,
	Jan Kara
  Cc: Naohiro Aota, Josef Bacik, linux-btrfs, linux-block,
	linux-fsdevel
In-Reply-To: <20260616-work-super-freeze_deny_upstream-v2-1-b3567c7f994b@kernel.org>

Looks good to me,

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>




^ permalink raw reply

* Re: [PATCH v2 2/5] block: split bdev_yield_claim() out of bdev_fput()
From: Johannes Thumshirn @ 2026-06-18 12:40 UTC (permalink / raw)
  To: Christian Brauner, Chris Mason, Jens Axboe, David Sterba,
	Jan Kara
  Cc: Naohiro Aota, Josef Bacik, linux-btrfs, linux-block,
	linux-fsdevel
In-Reply-To: <20260616-work-super-freeze_deny_upstream-v2-2-b3567c7f994b@kernel.org>

Looks good to me,

Reviewd-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>


^ permalink raw reply

* Re: [PATCH v2 3/7] rust: doctest: add LocalModule fallback for #[vtable] ThisModule
From: Andreas Hindborg @ 2026-06-18 12:13 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, Alvin Sun
In-Reply-To: <20260521-fix-fops-owner-v2-3-fd99079c5a04@linux.dev>

Alvin Sun <alvin.sun@linux.dev> writes:

> Add a `LocalModule` struct with a null-pointer `ModuleMetadata` impl
> in the doctest harness, so that `crate::LocalModule` (auto-inserted
> by `#[vtable]`) resolves correctly when there is no `module!` macro.
>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>

Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>

Does this need to be ordered before the vtable auto insert in the patch series?

Best regards,
Andreas Hindborg



^ permalink raw reply

* Re: [PATCH v2 7/7] block: rnull: use `LocalModule` for `THIS_MODULE`
From: Andreas Hindborg @ 2026-06-18 12:17 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, Alvin Sun
In-Reply-To: <20260521-fix-fops-owner-v2-7-fd99079c5a04@linux.dev>

Alvin Sun <alvin.sun@linux.dev> writes:

> Replace the `THIS_MODULE` import with `LocalModule` from the crate,
> consistent with the move of `THIS_MODULE` into the `ModuleMetadata`
> trait.
>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>

You need to squash this with the previous patch.


Best regards,
Andreas Hindborg




^ permalink raw reply

* Re: [PATCH v2 2/7] rust: macros: auto-insert ThisModule in #[vtable]
From: Andreas Hindborg @ 2026-06-18 12:11 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, Alvin Sun
In-Reply-To: <20260521-fix-fops-owner-v2-2-fd99079c5a04@linux.dev>

Alvin Sun <alvin.sun@linux.dev> writes:

> Auto-add `type ThisModule: ::kernel::ModuleMetadata;` as a required
> associated type on the trait side if not already defined, and
> auto-insert `type ThisModule = crate::LocalModule;` on the impl side
> if not explicitly provided, eliminating the need to manually declare
> and implement `ThisModule` in every vtable trait and impl.
>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>

Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>


Best regards,
Andreas Hindborg




^ permalink raw reply

* Re: [PATCH v2 1/7] rust: module: add `THIS_MODULE` const to `ModuleMetadata` trait
From: Andreas Hindborg @ 2026-06-18 12:04 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, Alvin Sun
In-Reply-To: <20260521-fix-fops-owner-v2-1-fd99079c5a04@linux.dev>

"Alvin Sun" <alvin.sun@linux.dev> writes:

> Add a `THIS_MODULE` const to the `ModuleMetadata` trait so that
> modules can provide their `ThisModule` pointer usable in const
> contexts such as static file_operations.
>
> Move the `THIS_MODULE` static from the `module!` macro into the
> `ModuleMetadata` impl, and update `__init` to use
> `LocalModule::THIS_MODULE` instead.
>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
> ---
>  rust/kernel/lib.rs    |  3 +++
>  rust/macros/module.rs | 34 +++++++++++++++++-----------------
>  2 files changed, 20 insertions(+), 17 deletions(-)
>
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index b72b2fbe046d6..f0cf0705d9697 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -184,6 +184,9 @@ fn init(module: &'static ThisModule) -> impl pin_init::PinInit<Self, error::Erro
>  pub trait ModuleMetadata {
>      /// The name of the module as specified in the `module!` macro.
>      const NAME: &'static crate::str::CStr;
> +
> +    /// The module's `THIS_MODULE` pointer.
> +    const THIS_MODULE: ThisModule;
>  }
>
>  /// Equivalent to `THIS_MODULE` in the C API.
> diff --git a/rust/macros/module.rs b/rust/macros/module.rs
> index 06c18e2075083..b6d7b3299fbf9 100644
> --- a/rust/macros/module.rs
> +++ b/rust/macros/module.rs
> @@ -497,28 +497,28 @@ pub(crate) fn module(info: ModuleInfo) -> Result<TokenStream> {
>          /// Used by the printing macros, e.g. [`info!`].
>          const __LOG_PREFIX: &[u8] = #name_cstr.to_bytes_with_nul();
>
> -        // SAFETY: `__this_module` is constructed by the kernel at load time and will not be
> -        // freed until the module is unloaded.
> -        #[cfg(MODULE)]
> -        static THIS_MODULE: ::kernel::ThisModule = unsafe {
> -            extern "C" {
> -                static __this_module: ::kernel::types::Opaque<::kernel::bindings::module>;
> -            };
> -
> -            ::kernel::ThisModule::from_ptr(__this_module.get())
> -        };
> -
> -        #[cfg(not(MODULE))]
> -        static THIS_MODULE: ::kernel::ThisModule = unsafe {
> -            ::kernel::ThisModule::from_ptr(::core::ptr::null_mut())
> -        };
> -
>          /// The `LocalModule` type is the type of the module created by `module!`,
>          /// `module_pci_driver!`, `module_platform_driver!`, etc.
>          type LocalModule = #type_;
>
>          impl ::kernel::ModuleMetadata for #type_ {
>              const NAME: &'static ::kernel::str::CStr = #name_cstr;
> +
> +            #[cfg(MODULE)]
> +            const THIS_MODULE: ::kernel::ThisModule = {
> +                extern "C" {
> +                    static __this_module: ::kernel::types::Opaque<::kernel::bindings::module>;
> +                }
> +
> +                // SAFETY: `__this_module` is constructed by the kernel at load time
> +                // and lives until the module is unloaded.
> +                unsafe { ::kernel::ThisModule::from_ptr(__this_module.get()) }
> +            };
> +
> +            #[cfg(not(MODULE))]
> +            const THIS_MODULE: ::kernel::ThisModule = unsafe {
> +                ::kernel::ThisModule::from_ptr(::core::ptr::null_mut())
> +            };
>          }
>
>          // Double nested modules, since then nobody can access the public items inside.
> @@ -616,7 +616,7 @@ pub extern "C" fn #ident_exit() {
>                  /// This function must only be called once.
>                  unsafe fn __init() -> ::kernel::ffi::c_int {
>                      let initer = <super::super::LocalModule as ::kernel::InPlaceModule>::init(
> -                        &super::super::THIS_MODULE
> +                        &<super::super::LocalModule as ::kernel::ModuleMetadata>::THIS_MODULE

Is it possible we could make this more ergonomic? Perhaps by adding a
helper:

  fn this_module<M: ::kernel::ModuleMetadata>() -> &'static ::kernel::ThisModule {
      &M::THIS_MODULE
  }

Then the invocation is a little better:

  let initer = <super::super::LocalModule as ::kernel::InPlaceModule>::init(
      this_module::<super::super::LocalModule>()
  );


Best regards,
Andreas Hindborg



^ permalink raw reply

* Re: [PATCH v3 6/7] rust: block: rnull: use vertical import style
From: Andreas Hindborg @ 2026-06-18 10:41 UTC (permalink / raw)
  To: Alvin Sun, Arnd Bergmann, Greg Kroah-Hartman, Miguel Ojeda,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Jens Axboe,
	Brendan Higgins, David Gow, Rae Moar
  Cc: rust-for-linux, linux-block, linux-kselftest, kunit-dev,
	Alvin Sun
In-Reply-To: <20260521-miscdev-use-format-v3-6-56240ca70d0c@linux.dev>

"Alvin Sun" <alvin.sun@linux.dev> writes:

> Convert `use` imports to vertical layout for better readability and
> maintainability.
>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>


Acked-by: Andreas Hindborg <a.hindborg@kernel.org>


Best regards,
Andreas Hindborg



^ permalink raw reply

* Re: [PATCH v2 4/5] rust: block: mq: use vertical import style
From: Andreas Hindborg @ 2026-06-18 10:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Alvin Sun, Arnd Bergmann, Greg Kroah-Hartman, Miguel Ojeda,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, rust-for-linux,
	linux-block, Alvin Sun
In-Reply-To: <20260520-miscdev-use-format-v2-4-64dc48fc1345@linux.dev>

"Alvin Sun" <alvin.sun@linux.dev> writes:

> Convert `use` imports to vertical layout for better readability and
> maintainability.
>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>


Acked-by: Andreas Hindborg <a.hindborg@kernel.org>

Cc: Jens Axboe <axboe@kernel.dk>

Best regards,
Andreas Hindborg




^ permalink raw reply

* Re: [PATCH v2 5/5] rust: block: mq: remove redundant imports and format
From: Andreas Hindborg @ 2026-06-18 10:32 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Alvin Sun, Arnd Bergmann, Greg Kroah-Hartman, Miguel Ojeda,
	Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, rust-for-linux,
	linux-block, Alvin Sun
In-Reply-To: <20260520-miscdev-use-format-v2-5-64dc48fc1345@linux.dev>

"Alvin Sun" <alvin.sun@linux.dev> writes:

> Drop `Result`, `Pin`, `pin_data`, `pinned_drop`, `PinInit`, and
> `try_pin_init` imports already provided by `kernel::prelude`.
>
> Simplify `error` imports and flatten parameters formatting.
>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>

Acked-by: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>

@Jens can you pick 4/5 and 5/5?


Best regards,
Andreas Hindborg


^ permalink raw reply

* Re: [PATCH 1/1] block: validate user space vectors during extraction
From: Christoph Hellwig @ 2026-06-18 10:26 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-block, linux-fsdevel, dm-devel, hch, axboe, brauner, djwong,
	viro, Keith Busch, stable
In-Reply-To: <20260617233235.1016063-2-kbusch@meta.com>

On Wed, Jun 17, 2026 at 04:32:35PM -0700, Keith Busch wrote:
> @@ -1242,7 +1242,7 @@ static int bio_iov_iter_align_down(struct bio *bio, struct iov_iter *iter,
>   * is returned only if 0 pages could be pinned.
>   */
>  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter,
> -			   unsigned len_align_mask)
> +			   unsigned len_align_mask, unsigned vec_align_mask)

vec_align_mask needs to be documented in the kernel doc.  And I find
the vec_align_mask name a bit confusing.  This is all about the physical
address (really the dma address, but the page aligned offset map 1:1),
so maybe phys_align_mask or dma_align_mask might be better names?

Also wouldn't it be more natural to pass the start alignment requirement
before the length alignment paramter?

> @@ -1251,6 +1251,11 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter,
>  
>  	if (iov_iter_is_bvec(iter)) {
>  		bio_iov_bvec_set(bio, iter);
> +
> +		if (mp_bvec_iter_offset(bio->bi_io_vec, bio->bi_iter) &
> +							vec_align_mask)
> +			return -EINVAL;

Can you add a comment here?  Especially as the bvec iter doesn't actually
require all individual bvecs to be aligned and I'm not entirely sure this
handles all case - writing down the rules might help a bit with that.

>  		ret = iov_iter_extract_bvecs(iter, bio->bi_io_vec,
>  				BIO_MAX_SIZE - bio->bi_iter.bi_size,
> -				&bio->bi_vcnt, bio->bi_max_vecs, flags);
> +				&bio->bi_vcnt, bio->bi_max_vecs,
> +				vec_align_mask, flags);
>  		if (ret <= 0) {
> +			if (ret == -EINVAL) {
> +				bio_release_pages(bio, false);
> +				bio_clear_flag(bio, BIO_PAGE_PINNED);
> +				bio->bi_iter.bi_size = 0;
> +				bio->bi_vcnt = 0;
> +				return ret;
> +			}

Do we need all this cleanups beyoned the bio_release_pages()?  Most
callers just free the bio, so should not care about it, and the error
handling in __blkdev_direct_IO that calls bio_endio looks buggy for
other reasons..

> + * @align_mask:	reject with -EINVAL if the source address or length is not
> + *		aligned to this mask

Maybe use the same paramater name as on the bio side here?

And not for this patch, but this makes me wonder if we should handle the
len alignment in iov_iter_extract_bvecs as well, as that should simplify
it quite a bit.


^ permalink raw reply

* Re: [PATCH 1/1] block: validate user space vectors during extraction
From: kernel test robot @ 2026-06-18 10:22 UTC (permalink / raw)
  To: Keith Busch, linux-block, linux-fsdevel
  Cc: llvm, oe-kbuild-all, dm-devel, hch, axboe, brauner, djwong, viro,
	Keith Busch, stable
In-Reply-To: <20260617233235.1016063-2-kbusch@meta.com>

Hi Keith,

kernel test robot noticed the following build warnings:

[auto build test WARNING on axboe/for-next]
[also build test WARNING on brauner-vfs/vfs.all akpm-mm/mm-nonmm-unstable linus/master v7.1 next-20260616]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Keith-Busch/block-validate-user-space-vectors-during-extraction/20260618-073522
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git for-next
patch link:    https://lore.kernel.org/r/20260617233235.1016063-2-kbusch%40meta.com
patch subject: [PATCH 1/1] block: validate user space vectors during extraction
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20260618/202606181254.ohF2ZO9K-lkp@intel.com/config)
compiler: clang version 22.1.8 (https://github.com/llvm/llvm-project ca7933e47d3a3451d81e72ac174dcb5aa28b59d1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260618/202606181254.ohF2ZO9K-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606181254.ohF2ZO9K-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Warning: block/bio.c:1245 function parameter 'vec_align_mask' not described in 'bio_iov_iter_get_pages'
>> Warning: block/bio.c:1245 function parameter 'vec_align_mask' not described in 'bio_iov_iter_get_pages'

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH RFC 0/1] block: fix concurrent elevator change failure
From: Shin'ichiro Kawasaki @ 2026-06-18  8:04 UTC (permalink / raw)
  To: Nilay Shroff; +Cc: Ming Lei, linux-block, Jens Axboe
In-Reply-To: <2371227f-43ef-4a0d-ad8f-da23eea43357@linux.ibm.com>

On Jun 17, 2026 / 16:38, Nilay Shroff wrote:
[...]
> Given the above, I'm fine with the earlier approach of upgrading update_nr_hwq_lock from
> a reader lock to a writer lock in elv_iosched_store(). That directly serializes concurrent
> scheduler updates and avoids the race on q->elevator without introducing additional lock
> ordering concerns.

Thanks for the comment. I will prepare the "writer lock in elv_iosched_store()"
approach as v2 patch.

^ permalink raw reply

* Re: [PATCH] virtio-blk: use little-endian types for the zoned fields
From: Stefano Garzarella @ 2026-06-18  7:41 UTC (permalink / raw)
  To: Michael Bommarito
  Cc: Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi, Dmitry Fomichev,
	Damien Le Moal, Jens Axboe, Paolo Bonzini, virtualization,
	linux-block, linux-kernel
In-Reply-To: <20260617151727.4071754-1-michael.bommarito@gmail.com>

On Wed, Jun 17, 2026 at 11:17:27AM -0400, Michael Bommarito wrote:
>The zoned block-device fields in the virtio-blk header are typed
>__virtio{32,64}, so their endianness follows VIRTIO_F_VERSION_1. The
>zoned feature is only defined for VIRTIO 1.x devices, and the virtio
>specification defines all of its fields as little-endian. Commit
>b16a1756c716 ("virtio_blk: mark all zone fields LE") tagged them
>__le* for exactly this reason, but commit f1ba4e674feb ("virtio-blk:
>fix to match virtio spec") re-applied the reviewed version of the
>original zoned series -- which predated b16a1756 -- and silently
>restored the __virtio* typing together with the matching
>virtio*_to_cpu() / virtio_cread() accessors in the driver.
>
>Restore the little-endian typing for the zoned configuration-space
>characteristics, the zone descriptor, the zone report header and the
>ZONE_APPEND in-header sector, and read them with le*_to_cpu() and
>virtio_cread_le() to match.
>
>There is no functional change on any spec-compliant device: zoned
>requires VIRTIO_F_VERSION_1, and for a VERSION_1 device
>virtio*_to_cpu() is identical to le*_to_cpu(). The change makes the
>uapi types describe the actual wire format and removes a latent
>endianness mismatch for a (non-conformant) legacy device on a
>big-endian guest.

Not for this patch, but at this point should we do the same also for the 
fields gated by the following features that IIUC are all added in 1.*:
- VIRTIO_BLK_F_MQ
- VIRTIO_BLK_F_DISCARD
- VIRTIO_BLK_F_WRITE_ZEROES
- VIRTIO_BLK_F_SECURE_ERASE

>
>Fixes: f1ba4e674feb ("virtio-blk: fix to match virtio spec")
>Suggested-by: Michael S. Tsirkin <mst@redhat.com>
>Assisted-by: Claude:claude-opus-4-8
>Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
>---
>Testing:
> - Builds with no new warnings; sparse endian-clean (C=2,
>   __CHECK_ENDIAN__, CONFIG_BLK_DEV_ZONED=y) both before and after.
> - Booted under QEMU with a host-managed zoned device exposed through
>   virtio-blk. Zone revalidation, blkzone report and a sequential
>   write / write-pointer check return correct values; blktests zbd
>   device tests 001-006 (sysfs+ioctl, report zone, reset, write split,
>   write ordering, revalidate) pass, with results identical before and
>   after this change -- expected, since on a VIRTIO_F_VERSION_1 device
>   virtio*_to_cpu() == le*_to_cpu().
>
> drivers/block/virtio_blk.c      | 38 +++++++++++++++------------------
> include/uapi/linux/virtio_blk.h | 18 ++++++++--------
> 2 files changed, 26 insertions(+), 30 deletions(-)
>
>diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>index b1c9a27fe00f3..5532cfbde7bfe 100644
>--- a/drivers/block/virtio_blk.c
>+++ b/drivers/block/virtio_blk.c
>@@ -99,7 +99,7 @@ struct virtblk_req {
> 		 * be the last byte.
> 		 */
> 		struct {
>-			__virtio64 sector;
>+			__le64 sector;
> 			u8 status;
> 		} zone_append;
> 	} in_hdr;
>@@ -335,14 +335,12 @@ static inline void virtblk_request_done(struct request *req)
> {
> 	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
> 	blk_status_t status = virtblk_result(virtblk_vbr_status(vbr));
>-	struct virtio_blk *vblk = req->mq_hctx->queue->queuedata;
>
> 	virtblk_unmap_data(req, vbr);
> 	virtblk_cleanup_cmd(req);
>
> 	if (req_op(req) == REQ_OP_ZONE_APPEND)
>-		req->__sector = virtio64_to_cpu(vblk->vdev,
>-						vbr->in_hdr.zone_append.sector);
>+		req->__sector = le64_to_cpu(vbr->in_hdr.zone_append.sector);
>
> 	blk_mq_end_request(req, status);
> }
>@@ -589,13 +587,13 @@ static int virtblk_parse_zone(struct virtio_blk *vblk,
> {
> 	struct blk_zone zone = { };
>
>-	zone.start = virtio64_to_cpu(vblk->vdev, entry->z_start);
>+	zone.start = le64_to_cpu(entry->z_start);
> 	if (zone.start + vblk->zone_sectors <= get_capacity(vblk->disk))
> 		zone.len = vblk->zone_sectors;
> 	else
> 		zone.len = get_capacity(vblk->disk) - zone.start;
>-	zone.capacity = virtio64_to_cpu(vblk->vdev, entry->z_cap);
>-	zone.wp = virtio64_to_cpu(vblk->vdev, entry->z_wp);
>+	zone.capacity = le64_to_cpu(entry->z_cap);
>+	zone.wp = le64_to_cpu(entry->z_wp);
>
> 	switch (entry->z_type) {
> 	case VIRTIO_BLK_ZT_SWR:
>@@ -687,8 +685,7 @@ static int virtblk_report_zones(struct gendisk *disk, sector_t sector,
> 		if (ret)
> 			goto fail_report;
>
>-		nz = min_t(u64, virtio64_to_cpu(vblk->vdev, report->nr_zones),
>-			   nr_zones);
>+		nz = min_t(u64, le64_to_cpu(report->nr_zones), nr_zones);
> 		if (!nz)
> 			break;
>
>@@ -698,8 +695,7 @@ static int virtblk_report_zones(struct gendisk *disk, sector_t sector,
> 			if (ret)
> 				goto fail_report;
>
>-			sector = virtio64_to_cpu(vblk->vdev,
>-						 report->zones[i].z_start) +
>+			sector = le64_to_cpu(report->zones[i].z_start) +
> 				 vblk->zone_sectors;
> 			zone_idx++;
> 		}
>@@ -725,18 +721,18 @@ static int virtblk_read_zoned_limits(struct virtio_blk *vblk,
>
> 	lim->features |= BLK_FEAT_ZONED;
>
>-	virtio_cread(vdev, struct virtio_blk_config,
>-		     zoned.max_open_zones, &v);
>+	virtio_cread_le(vdev, struct virtio_blk_config,
>+			zoned.max_open_zones, &v);
> 	lim->max_open_zones = v;
> 	dev_dbg(&vdev->dev, "max open zones = %u\n", v);
>
>-	virtio_cread(vdev, struct virtio_blk_config,
>-		     zoned.max_active_zones, &v);
>+	virtio_cread_le(vdev, struct virtio_blk_config,
>+			zoned.max_active_zones, &v);
> 	lim->max_active_zones = v;
> 	dev_dbg(&vdev->dev, "max active zones = %u\n", v);
>
>-	virtio_cread(vdev, struct virtio_blk_config,
>-		     zoned.write_granularity, &wg);
>+	virtio_cread_le(vdev, struct virtio_blk_config,
>+			zoned.write_granularity, &wg);
> 	if (!wg) {
> 		dev_warn(&vdev->dev, "zero write granularity reported\n");
> 		return -ENODEV;
>@@ -750,8 +746,8 @@ static int virtblk_read_zoned_limits(struct virtio_blk *vblk,
> 	 * virtio ZBD specification doesn't require zones to be a power of
> 	 * two sectors in size, but the code in this driver expects that.
> 	 */
>-	virtio_cread(vdev, struct virtio_blk_config, zoned.zone_sectors,
>-		     &vblk->zone_sectors);
>+	virtio_cread_le(vdev, struct virtio_blk_config, zoned.zone_sectors,
>+			&vblk->zone_sectors);
> 	if (vblk->zone_sectors == 0 || !is_power_of_2(vblk->zone_sectors)) {
> 		dev_err(&vdev->dev,
> 			"zoned device with non power of two zone size %u\n",
>@@ -767,8 +763,8 @@ static int virtblk_read_zoned_limits(struct virtio_blk *vblk,
> 		lim->max_hw_discard_sectors = 0;
> 	}
>
>-	virtio_cread(vdev, struct virtio_blk_config,
>-		     zoned.max_append_sectors, &v);
>+	virtio_cread_le(vdev, struct virtio_blk_config,
>+			zoned.max_append_sectors, &v);
> 	if (!v) {
> 		dev_warn(&vdev->dev, "zero max_append_sectors reported\n");
> 		return -ENODEV;
>diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
>index 3744e4da1b2a7..5af2a0300bb9d 100644
>--- a/include/uapi/linux/virtio_blk.h
>+++ b/include/uapi/linux/virtio_blk.h
>@@ -140,11 +140,11 @@ struct virtio_blk_config {
>

To avoid making this mistake again, how about adding a note here to 
clarify that all the fields listed below are defined only for VIRTIO 1.x 
devices and are therefore always little-endian?

Anyway, the patch LGTM:

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>


> 	/* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */
> 	struct virtio_blk_zoned_characteristics {
>-		__virtio32 zone_sectors;
>-		__virtio32 max_open_zones;
>-		__virtio32 max_active_zones;
>-		__virtio32 max_append_sectors;
>-		__virtio32 write_granularity;
>+		__le32 zone_sectors;
>+		__le32 max_open_zones;
>+		__le32 max_active_zones;
>+		__le32 max_append_sectors;
>+		__le32 write_granularity;
> 		__u8 model;
> 		__u8 unused2[3];
> 	} zoned;
>@@ -241,11 +241,11 @@ struct virtio_blk_outhdr {
>  */
> struct virtio_blk_zone_descriptor {
> 	/* Zone capacity */
>-	__virtio64 z_cap;
>+	__le64 z_cap;
> 	/* The starting sector of the zone */
>-	__virtio64 z_start;
>+	__le64 z_start;
> 	/* Zone write pointer position in sectors */
>-	__virtio64 z_wp;
>+	__le64 z_wp;
> 	/* Zone type */
> 	__u8 z_type;
> 	/* Zone state */
>@@ -254,7 +254,7 @@ struct virtio_blk_zone_descriptor {
> };
>
> struct virtio_blk_zone_report {
>-	__virtio64 nr_zones;
>+	__le64 nr_zones;
> 	__u8 reserved[56];
> 	struct virtio_blk_zone_descriptor zones[];
> };
>-- 
>2.53.0
>


^ permalink raw reply

* Re: [PATCH blktests] ublk: mark all tests as QUICK
From: Shin'ichiro Kawasaki @ 2026-06-18  6:20 UTC (permalink / raw)
  To: Sebastian Chlad; +Cc: linux-block, Sebastian Chlad
In-Reply-To: <20260615094144.13060-1-sebastian.chlad@suse.com>

On Jun 15, 2026 / 11:41, Sebastian Chlad wrote:
> These tests are quick to run so mark them accordingly to ensure
> they are included in quick runs.

Thanks, I applied it.

^ permalink raw reply

* [PATCH 1/1] block: validate user space vectors during extraction
From: Keith Busch @ 2026-06-17 23:32 UTC (permalink / raw)
  To: linux-block, linux-fsdevel
  Cc: dm-devel, hch, axboe, brauner, djwong, viro, Keith Busch, stable
In-Reply-To: <20260617233235.1016063-1-kbusch@meta.com>

From: Keith Busch <kbusch@kernel.org>

The blk-mq based drivers have every incoming bio validated by an
unconditional __bio_split_to_limits() call, which rejects any segment
that does not meet the queue's dma_alignment with BLK_STS_INVAL, so they
only see viable requests. A bio-based driver, though, receives a bio
whose memory alignment has not been checked.

Misalignment is possible for vectors supplied from user space direct-io.
When a stacking driver forwards a misaligned bio to a member device,
that member may reject it with BLK_STS_INVAL if the lower level attempts
to split the bio to the queue limits. The stacker tends to mishandle the
error: dm-raid1 may degrade an otherwise healthy array.

Alternatively, some lower level bio based block drivers never attempt to
split their bio and assume the one received is viable. If it's
unaligned, block devices like brd and pmem may corrupt their data as
they have a strong dependency on sector size aligned bvecs.

Validate the source against the device's dma_alignment where the bio is
built from the iov_iter, rejecting misaligned I/O with -EINVAL before it
is submitted. This is done opportunistically in a path that already pins
the pages, so no additional io vector walking is needed.

The required alignment is supplied by the callers as vec_align_mask
(bdev_dma_alignment()); passthrough and the bounce path pass 0 as they
have no such requirement. If a vector is misaligned while building the
bio, any pages already pinned into that bio are released before
returning.

Cc: stable@vger.kernel.org
Fixes: 5ff3f74e145a ("block: simplify direct io validity check")
Fixes: 7eac33186957 ("iomap: simplify direct io validity check")
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 block/bio.c          | 19 ++++++++++++++++---
 block/blk-map.c      |  2 +-
 block/fops.c         |  3 ++-
 fs/iomap/direct-io.c |  3 ++-
 include/linux/bio.h  |  2 +-
 include/linux/uio.h  |  3 ++-
 lib/iov_iter.c       |  9 ++++++++-
 7 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index f2a5f4d0a9672..1bd7da889e069 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1242,7 +1242,7 @@ static int bio_iov_iter_align_down(struct bio *bio, struct iov_iter *iter,
  * is returned only if 0 pages could be pinned.
  */
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter,
-			   unsigned len_align_mask)
+			   unsigned len_align_mask, unsigned vec_align_mask)
 {
 	iov_iter_extraction_t flags = 0;
 
@@ -1251,6 +1251,11 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter,
 
 	if (iov_iter_is_bvec(iter)) {
 		bio_iov_bvec_set(bio, iter);
+
+		if (mp_bvec_iter_offset(bio->bi_io_vec, bio->bi_iter) &
+							vec_align_mask)
+			return -EINVAL;
+
 		iov_iter_advance(iter, bio->bi_iter.bi_size);
 		return 0;
 	}
@@ -1265,8 +1270,16 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter,
 
 		ret = iov_iter_extract_bvecs(iter, bio->bi_io_vec,
 				BIO_MAX_SIZE - bio->bi_iter.bi_size,
-				&bio->bi_vcnt, bio->bi_max_vecs, flags);
+				&bio->bi_vcnt, bio->bi_max_vecs,
+				vec_align_mask, flags);
 		if (ret <= 0) {
+			if (ret == -EINVAL) {
+				bio_release_pages(bio, false);
+				bio_clear_flag(bio, BIO_PAGE_PINNED);
+				bio->bi_iter.bi_size = 0;
+				bio->bi_vcnt = 0;
+				return ret;
+			}
 			if (!bio->bi_vcnt)
 				return ret;
 			break;
@@ -1377,7 +1390,7 @@ static int bio_iov_iter_bounce_read(struct bio *bio, struct iov_iter *iter,
 		ssize_t ret;
 
 		ret = iov_iter_extract_bvecs(iter, bio->bi_io_vec + 1, len,
-				&bio->bi_vcnt, bio->bi_max_vecs - 1, 0);
+				&bio->bi_vcnt, bio->bi_max_vecs - 1, 0, 0);
 		if (ret <= 0) {
 			if (!bio->bi_vcnt) {
 				folio_put(folio);
diff --git a/block/blk-map.c b/block/blk-map.c
index 768549f19f97e..c9535efe1a913 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -274,7 +274,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 	 * No alignment requirements on our part to support arbitrary
 	 * passthrough commands.
 	 */
-	ret = bio_iov_iter_get_pages(bio, iter, 0);
+	ret = bio_iov_iter_get_pages(bio, iter, 0, 0);
 	if (ret)
 		goto out_put;
 	ret = blk_rq_append_bio(rq, bio);
diff --git a/block/fops.c b/block/fops.c
index 15783a6180dec..928ba9be170cd 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -47,7 +47,8 @@ static inline int blkdev_iov_iter_get_pages(struct bio *bio,
 		struct iov_iter *iter, struct block_device *bdev)
 {
 	return bio_iov_iter_get_pages(bio, iter,
-			bdev_logical_block_size(bdev) - 1);
+			bdev_logical_block_size(bdev) - 1,
+			bdev_dma_alignment(bdev));
 }
 
 #define DIO_INLINE_BIO_VECS 4
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index b485e3b191daf..645a4e9cd25f9 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -358,7 +358,8 @@ static ssize_t iomap_dio_bio_iter_one(struct iomap_iter *iter,
 				iomap_max_bio_size(&iter->iomap), alignment);
 	else
 		ret = bio_iov_iter_get_pages(bio, dio->submit.iter,
-					     alignment - 1);
+					     alignment - 1,
+					     bdev_dma_alignment(bio->bi_bdev));
 	if (unlikely(ret))
 		goto out_put_bio;
 	ret = bio->bi_iter.bi_size;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8f33f717b14f5..13be7edb524fc 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -477,7 +477,7 @@ int bdev_rw_virt(struct block_device *bdev, sector_t sector, void *data,
 		size_t len, enum req_op op);
 
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter,
-		unsigned len_align_mask);
+		unsigned len_align_mask, unsigned vec_align_mask);
 
 void bio_iov_bvec_set(struct bio *bio, const struct iov_iter *iter);
 void __bio_release_pages(struct bio *bio, bool mark_dirty);
diff --git a/include/linux/uio.h b/include/linux/uio.h
index a9bc5b3067e32..be8b2625b376a 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -391,7 +391,8 @@ ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages,
 			       size_t *offset0);
 ssize_t iov_iter_extract_bvecs(struct iov_iter *iter, struct bio_vec *bv,
 		size_t max_size, unsigned short *nr_vecs,
-		unsigned short max_vecs, iov_iter_extraction_t extraction_flags);
+		unsigned short max_vecs, unsigned align_mask,
+		iov_iter_extraction_t extraction_flags);
 
 /**
  * iov_iter_extract_will_pin - Indicate how pages from the iterator will be retained
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 273919b161617..ccd5b49f6b78d 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1886,6 +1886,8 @@ static unsigned int get_contig_folio_len(struct page **pages,
  * @max_size:	maximum size to extract from @iter
  * @nr_vecs:	number of vectors in @bv (on in and output)
  * @max_vecs:	maximum vectors in @bv, including those filled before calling
+ * @align_mask:	reject with -EINVAL if the source address or length is not
+ *		aligned to this mask
  * @extraction_flags: flags to qualify request
  *
  * Like iov_iter_extract_pages(), but returns physically contiguous ranges
@@ -1897,14 +1899,19 @@ static unsigned int get_contig_folio_len(struct page **pages,
  */
 ssize_t iov_iter_extract_bvecs(struct iov_iter *iter, struct bio_vec *bv,
 		size_t max_size, unsigned short *nr_vecs,
-		unsigned short max_vecs, iov_iter_extraction_t extraction_flags)
+		unsigned short max_vecs, unsigned align_mask,
+		iov_iter_extraction_t extraction_flags)
 {
+	unsigned long start = (unsigned long)iter_iov_addr(iter);
 	unsigned short entries_left = max_vecs - *nr_vecs;
 	unsigned short nr_pages, i = 0;
 	size_t left, offset, len;
 	struct page **pages;
 	ssize_t size;
 
+	if ((start | iter_iov_len(iter)) & align_mask)
+		return -EINVAL;
+
 	/*
 	 * Move page array up in the allocated memory for the bio vecs as far as
 	 * possible so that we can start filling biovecs from the beginning
-- 
2.52.0


^ permalink raw reply related

* [PATCH 0/1] direct-io: validate user space vectors during extraction
From: Keith Busch @ 2026-06-17 23:32 UTC (permalink / raw)
  To: linux-block, linux-fsdevel
  Cc: dm-devel, hch, axboe, brauner, djwong, viro, Keith Busch

From: Keith Busch <kbusch@kernel.org>

This addresses the misaligned direct-io problem behind various threads:

 https://lore.kernel.org/linux-xfs/20260610145218.141369-1-cem@kernel.org/
 https://lore.kernel.org/all/CAC_j7i1R7oy+nRhxEjCTba=DUgn02w9X+p94DCu0aHv5+5tKnQ@mail.gmail.com/
 https://lore.kernel.org/linux-block/ai7rnH20IYeSmY8s@gallifrey/
 https://lore.kernel.org/linux-block/20260616154009.2123183-1-kbusch@meta.com/

The various tested fixes are correct as far as they go, but they treat the
symptom: they only matter because an invalid bio reaches those drivers in the
first place.

The reason it reaches them is an assumption I made when I removed
direct-io alignment checks in 5ff3f74e145a ("block: simplify direct io
validity check") and 7eac331869575 ("iomap: simplify direct io validity
check"): every bio is eventually split to the device limits, and the
upper layers cope with resulting errors once the bio has formed. Both
were optimistic assumptions. Drivers with their own ->submit_bio may
never pass through blk_mq_submit_bio()'s split, so the check never runs
for them, and as numerous threads showed, the consumers don't uniformly
handle this condition.

This patch stops the invalid bio at the source instead. It validates the
buffer's alignment against the alignment limits when the bio is built
from the iov_iter. The check is folded into the bvec extraction that
already walks the vectors, so it adds only a comparison on a path that
is pinning direct-io pages anyway. Misalignment is now uniformly
rejected with EINVAL before submission for every direct-io submission
path.

With this in place, the dm side changes under discussion are no longer
required to fix the bugs: the affected targets simply never see the
invalid bio. The tested patches remain reasonable as defense-in-depth if
desired, but they are not strictly necessary after this.

Keith Busch (1):
  block: validate user space vectors during extraction

 block/bio.c          | 19 ++++++++++++++++---
 block/blk-map.c      |  2 +-
 block/fops.c         |  3 ++-
 fs/iomap/direct-io.c |  3 ++-
 include/linux/bio.h  |  2 +-
 include/linux/uio.h  |  3 ++-
 lib/iov_iter.c       |  9 ++++++++-
 7 files changed, 32 insertions(+), 9 deletions(-)

-- 
2.52.0


^ permalink raw reply

* Re: [PATCH 00/19] init: discoverable root partitions, a.k.a. an omittable "root=" cmdline option
From: Vincent Mailhol @ 2026-06-17 20:56 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jens Axboe, Davidlohr Bueso, Alexander Viro, Jan Kara,
	linux-kernel, linux-block, linux-efi, linux-fsdevel,
	Richard Henderson, Matt Turner, Magnus Lindholm, linux-alpha,
	Vineet Gupta, linux-snps-arc, Russell King, linux-arm-kernel,
	Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui, loongarch,
	Thomas Bogendoerfer, linux-mips, James E.J. Bottomley,
	Helge Deller, linux-parisc, Madhavan Srinivasan, Michael Ellerman,
	linuxppc-dev, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	linux-riscv, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	linux-s390, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Jonathan Corbet, Shuah Khan, linux-doc
In-Reply-To: <20260617-irritation-rollen-wirst-7d636cbfec92@brauner>

On 17/06/2026 at 14:41, Christian Brauner wrote:
> On Mon, Jun 15, 2026 at 06:08:56PM +0200, Vincent Mailhol wrote:
>> DPS [1] defines GPT partition type UUIDs for OS partitions and
>> attributes that control whether such partitions should be
>> automatically discovered. The specification states that:
>>
>>   The OS can discover and mount the necessary file systems with a
>>   non-existent or incomplete /etc/fstab file and without the root=
>>   kernel command line option.
>>
>> DPS is already implemented in systemd-gpt-auto-generator [2], which,
>> when embedded in an initrd, indeed allows automatic detection of the
>> root filesystem through its partition type UUID.
>>
>> This series adds this discovery feature directly into the kernel so
>> that people who are not using systemd or not using an initrd can still
>> benefit from it. The implementation follows the same model as
>> systemd-gpt-auto-generator:
> 
> I happen to co-maintain the DPS. It is userspace policy and complex
> userspace policy at that and does not belong into the kernel.
> 
> This also implements a really tiny portion of the spec. It deals with a
> lot more complex concepts such as automatic partitioning during
> installation, verity, LUKS, containers. This is really not intended for
> the kernel at all. I mean, it's great that this spec is being used but I
> do not want this in the kernel just for the sake of auto-discovery.

The implementation of a tiny portion is voluntary. If I can draw a
parallel, it would be the same as saying that the root= cmdline option
is a tiny portion of what an fstab can do.

Yes it does not manage the LUKS, containers and so on, the same way it
is not possible to directly boot those things directly from the kernel.

So, I don't think this conflicts with the actual userland
implementations, the same way you can add root= to your command line and
still have an initrd next to it.

I did not intend to write this as a replacement but just as a complement
to fill the gap of kernel with no initrd.

> The DPS is completely generic and can be implemented by tooling other
> than systemd (util-linux implements it and so does refind iirc). I think
> not wanting to use or build alternative userspace tooling for this is a
> really weak argument for pushing this into the kernel.

Well, I might explain to you where I come from. Time to time, I mess up
my configuration. When this issue is in a userland config file (e.g. bad
fstab), the recovery is always easy.

But when I mess up the bootloader firmware configuration (e.g. grub,
u-boot, edk2), the fix is always painful. I have to fight with a shell
with which I am not familiar with to figure out what the correct
configuration is.

And an initrd would help but:

 - it is still one more file to look for pass as a parameter
 - on some machine I do not have one anyway

I think it would have been very neet to have a method to boot a kernel
with zero config (understand here: no cmdline, no initrd) and I find out
that DPS could achieve that if just a tiny part of it were implemented
in the kernel.

For example, in edk2, I would be able to just browse the disk from the
"Boot from file" menu and select a kernel. Currently it panics because
no configuration is attached. With DPS, we could have it boot linux from
that menu. All in a graphical interface, with just up/down arrows and
one enter keypress.

And this is my motivation. This non LUKS root read-only part of the DPS
is the only piece which makes sense for me in the kernel. Not that I
don't *want* to implement it in userland, but just that it doesn't
achieve what would be helpful to me (and I guess others).

I thought I wouldn't be the only one in the world to see value in that
this is why I posted it.


Yours sincerely,
Vincent Mailhol


^ permalink raw reply

* Re: [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
From: Dr. David Alan Gilbert @ 2026-06-17 16:59 UTC (permalink / raw)
  To: Keith Busch, regressions
  Cc: Keith Busch, dm-devel, linux-block, mpatocka, Vjaceslavs Klimovs
In-Reply-To: <ajLRTkSZJ0WCYNk4@kbusch-mbp>

* Keith Busch (kbusch@kernel.org) wrote:
> On Wed, Jun 17, 2026 at 04:44:35PM +0000, Dr. David Alan Gilbert wrote:
> > (It's a bit scary you're having to go around quite
> > a few places and make similar fixes; I assume there
> > are others that do similar things).
> 
> Yes, I understand that. I'm looking into a common way to validate this.
> The md raid doesn't have this problem because they always call
> bio_split_to_limits() first, but that's not an optimal thing to do for
> dm raid in the normal read/write path, so perhaps a common checker needs
> to happen generically in the block layer. Yeah, I know I removed the
> previous higher level validation ... I'll try find something less costly
> than what we had before.

OK, thanks again
(and to Thomas for gluing my query to those other two which got this
moving!)

Dave.
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox