* Re: [PATCH 6.12 000/272] 6.12.92-rc1 review
From: Miguel Ojeda @ 2026-05-29 6:09 UTC (permalink / raw)
To: gregkh
Cc: achill, akpm, broonie, conor, f.fainelli, hargar, jonathanh,
linux-kernel, linux, lkft-triage, patches, patches, pavel,
rwarsow, shuah, sr, stable, sudipm.mukherjee, torvalds,
Miguel Ojeda, Anuj Gupta, Kanchan Joshi, Christoph Hellwig,
Keith Busch, Jens Axboe, linux-block
In-Reply-To: <20260528194629.379955525@linuxfoundation.org>
On Thu, 28 May 2026 21:46:14 +0200 Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
>
> This is the start of the stable review cycle for the 6.12.92 release.
> There are 272 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sat, 30 May 2026 19:45:52 +0000.
> Anything received after that time might be too late.
Boot-tested under QEMU for Rust x86_64, arm64 and riscv64; built-tested
for loongarch64:
Tested-by: Miguel Ojeda <ojeda@kernel.org>
I am seeing:
In file included from kernel/trace/blktrace.c:23:
In file included from kernel/trace/../../block/blk.h:5:
./include/linux/bio-integrity.h:101:12: error: unused function 'bio_integrity_map_user' [-Werror,-Wunused-function]
101 | static int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter)
| ^~~~~~~~~~~~~~~~~~~~~~
This looks like it needs:
546d191427cf ("block: make bio_integrity_map_user() static inline")
(and indeed in my run `CONFIG_BLK_DEV_INTEGRITY` is not set like the
commit message says).
Cc: Anuj Gupta <anuj20.g@samsung.com>
Cc: Kanchan Joshi <joshi.k@samsung.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
CC: linux-block@vger.kernel.org
Thanks!
Cheers,
Miguel
^ permalink raw reply
* Re: [PATCH v2 1/2] block: Add bvec_folio()
From: Hannes Reinecke @ 2026-05-29 6:16 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Jens Axboe
Cc: linux-block, linux-kernel, io-uring, linux-mm, Leon Romanovsky,
Christoph Hellwig
In-Reply-To: <20260528175905.1102280-2-willy@infradead.org>
On 5/28/26 19:59, Matthew Wilcox (Oracle) wrote:
> This is a simple helper which replaces page_folio(bvec->bv_page).
> Minor improvement in readability, but the real motivation is to reduce
> the number of references to bvec->bv_page so that it can be changed
> with less work.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Leon Romanovsky <leon@kernel.org>
> ---
> block/bio.c | 6 +++---
> include/linux/bio.h | 2 +-
> include/linux/bvec.h | 15 +++++++++++++++
> io_uring/rsrc.c | 2 +-
> mm/page_io.c | 4 ++--
> 5 files changed, 22 insertions(+), 7 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* Re: [PATCH v2 2/2] block: Include bvec.h kernel-doc in the htmldocs
From: Hannes Reinecke @ 2026-05-29 6:17 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Jens Axboe
Cc: linux-block, linux-kernel, io-uring, linux-mm, Leon Romanovsky,
Christoph Hellwig
In-Reply-To: <20260528175905.1102280-3-willy@infradead.org>
On 5/28/26 19:59, Matthew Wilcox (Oracle) wrote:
> People have gone to the trouble of writing this kernel-doc; the
> least we can do is publish it.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> Documentation/core-api/kernel-api.rst | 1 +
> include/linux/bvec.h | 2 ++
> 2 files changed, 3 insertions(+)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* Re: [PATCH v5 01/12] block: Annotate the queue limits functions
From: Hannes Reinecke @ 2026-05-29 6:23 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Damien Le Moal
In-Reply-To: <ecdb2e97d3824a64292be86eacdf87684cc9af85.1779997063.git.bvanassche@acm.org>
On 5/28/26 21:45, Bart Van Assche wrote:
> Let the thread-safety checker verify whether every start of a queue
> limits update is followed by a call to a function that finishes a queue
> limits update.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> include/linux/blkdev.h | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* Re: [PATCH v5 02/12] block/bdev: Annotate the blk_holder_ops callback functions
From: Hannes Reinecke @ 2026-05-29 6:24 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Damien Le Moal
In-Reply-To: <b0541615a41fc06f1c0636575db48dd3d7f30e64.1779997063.git.bvanassche@acm.org>
On 5/28/26 21:45, Bart Van Assche wrote:
> The four callback functions in blk_holder_ops all release the
> bd_holder_lock. Annotate these functions accordingly.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> include/linux/blkdev.h | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* Re: [PATCH v5 03/12] block/cgroup: Split blkg_conf_prep()
From: Hannes Reinecke @ 2026-05-29 6:26 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Damien Le Moal, Tejun Heo,
Josef Bacik, Yu Kuai
In-Reply-To: <4e4b11711301f1522fec5fb59de12cbd48f5038f.1779997063.git.bvanassche@acm.org>
On 5/28/26 21:45, Bart Van Assche wrote:
> Move the blkg_conf_open_bdev() call out of blkg_conf_prep() to make it
> possible to add lock context annotations to blkg_conf_prep(). Change an
> if-statement in blkg_conf_open_bdev() into a WARN_ON_ONCE() call. Export
> blkg_conf_open_bdev() because it is called by the BFQ I/O scheduler and
> the BFQ I/O scheduler may be built as a kernel module.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Tejun Heo <tj@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> block/bfq-cgroup.c | 4 ++++
> block/blk-cgroup.c | 18 ++++++++----------
> block/blk-iocost.c | 4 ++++
> 3 files changed, 16 insertions(+), 10 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* Re: [PATCH v5 04/12] block/cgroup: Split blkg_conf_exit()
From: Hannes Reinecke @ 2026-05-29 6:27 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Damien Le Moal, Tejun Heo,
Josef Bacik, Yu Kuai, Nathan Chancellor
In-Reply-To: <1b128afeb0797264cdad34d0102ae65f9a1e478d.1779997063.git.bvanassche@acm.org>
On 5/28/26 21:45, Bart Van Assche wrote:
> Split blkg_conf_exit() into blkg_conf_unprep() and blkg_conf_close_bdev()
> because blkg_conf_exit() is not compatible with the Clang thread-safety
> annotations. Remove blkg_conf_exit(). Rename blkg_conf_exit_frozen() into
> blkg_conf_close_bdev_frozen(). Add thread-safety annotations to the new
> functions.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Tejun Heo <tj@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> block/bfq-cgroup.c | 9 ++++--
> block/blk-cgroup.c | 57 ++++++++++++++++++------------------
> block/blk-cgroup.h | 6 ++--
> block/blk-iocost.c | 67 +++++++++++++++++++++----------------------
> block/blk-iolatency.c | 19 ++++++------
> block/blk-throttle.c | 34 +++++++++++++---------
> 6 files changed, 101 insertions(+), 91 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* Re: [PATCH v5 05/12] block/cgroup: Improve lock context annotations
From: Hannes Reinecke @ 2026-05-29 6:27 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Damien Le Moal, Tejun Heo,
Josef Bacik
In-Reply-To: <a30486e2e9695614e4407d5ad8c75d637f5b31db.1779997063.git.bvanassche@acm.org>
On 5/28/26 21:45, Bart Van Assche wrote:
> Add lock context annotations where these are missing. Move the
> blkg_conf_prep() annotation into block/blk-cgroup.h to make it visible
> to all blkg_conf_prep() callers.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> block/blk-cgroup.c | 1 -
> block/blk-cgroup.h | 15 ++++++++++-----
> 2 files changed, 10 insertions(+), 6 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* Re: [PATCH v5 06/12] block/cgroup: Inline blkg_conf_{open,close}_bdev_frozen()
From: Hannes Reinecke @ 2026-05-29 6:29 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Damien Le Moal, Tejun Heo,
Josef Bacik
In-Reply-To: <f7be976e5a79afc88970ea16158d0a64bbf8f25e.1779997063.git.bvanassche@acm.org>
On 5/28/26 21:45, Bart Van Assche wrote:
> The blkg_conf_open_bdev_frozen() calling convention is not compatible
> with lock context annotations. Inline both blkg_conf_open_bdev_frozen()
> and blkg_conf_close_bdev_frozen() because these functions only have a
> single caller. This patch prepares for enabling lock context analysis.
>
> The type of 'memflags' has been changed from unsigned long into unsigned
> int to match the type of current->flags. See also <linux/sched.h>.
>
> Cc: Tejun Heo <tj@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> block/blk-cgroup.c | 46 ----------------------------------------------
> block/blk-cgroup.h | 4 ----
> block/blk-iocost.c | 29 +++++++++++++++++++++++------
> 3 files changed, 23 insertions(+), 56 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* Re: [PATCH v5 07/12] block/crypto: Annotate the crypto functions
From: Hannes Reinecke @ 2026-05-29 6:30 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Damien Le Moal, Eric Biggers,
Nathan Chancellor
In-Reply-To: <91f2acf006f02c42d5d418f3074a6541e79e6fce.1779997063.git.bvanassche@acm.org>
On 5/28/26 21:45, Bart Van Assche wrote:
> Add the lock context annotations required for Clang's thread-safety
> analysis.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> block/blk-crypto-profile.c | 2 ++
> 1 file changed, 2 insertions(+)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* Re: [PATCH v5 08/12] block/blk-iocost: Add lock context annotations
From: Hannes Reinecke @ 2026-05-29 6:31 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Damien Le Moal, Tejun Heo,
Josef Bacik
In-Reply-To: <669c0e4dd3a8e8e038dff8c19aa61f0359df2bd8.1779997063.git.bvanassche@acm.org>
On 5/28/26 21:45, Bart Van Assche wrote:
> Since iocg_lock() and iocg_unlock() both use conditional locking,
> annotate both with __no_context_analysis and use token_context_lock() to
> introduce a new lock context.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> block/blk-iocost.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* Re: [PATCH v5 09/12] block/blk-mq-debugfs: Improve lock context annotations
From: Hannes Reinecke @ 2026-05-29 6:34 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Damien Le Moal, Nathan Chancellor
In-Reply-To: <b49bc91945356894b8ab1dc2de1aabad6db3ce16.1779997063.git.bvanassche@acm.org>
On 5/28/26 21:45, Bart Van Assche wrote:
> Make the existing lock context annotations compatible with Clang. Add
> the lock context annotations that are missing.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> block/blk-mq-debugfs.c | 12 ++++++------
> block/blk.h | 4 ++++
> 2 files changed, 10 insertions(+), 6 deletions(-)
>
Reviewed-by: Hannes Reiecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* Re: [PATCH v5 10/12] block/kyber: Make the lock context annotations compatible with Clang
From: Hannes Reinecke @ 2026-05-29 6:35 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Damien Le Moal, Nathan Chancellor
In-Reply-To: <5165d212359b6b2ac70f52917282b85ea6c75fdf.1779997063.git.bvanassche@acm.org>
On 5/28/26 21:45, Bart Van Assche wrote:
> While sparse ignores the __acquires() and __releases() arguments, Clang
> verifies these. Make the arguments of __acquires() and __releases()
> acceptable for Clang.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> block/kyber-iosched.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* Re: [PATCH v5 11/12] block/mq-deadline: Make the lock context annotations compatible with Clang
From: Hannes Reinecke @ 2026-05-29 6:35 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Damien Le Moal, Nathan Chancellor
In-Reply-To: <be44a8b8ed93792d33b07de74c971d1a8a5703f8.1779997063.git.bvanassche@acm.org>
On 5/28/26 21:45, Bart Van Assche wrote:
> While sparse ignores the __acquires() and __releases() arguments, Clang
> verifies these. Make the arguments of __acquires() and __releases()
> acceptable for Clang.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> block/mq-deadline.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* Re: [PATCH v5 12/12] block: Enable lock context analysis
From: Hannes Reinecke @ 2026-05-29 6:36 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Damien Le Moal
In-Reply-To: <e4d44af70627b83fedadd9501609a2eec5d21ec3.1779997063.git.bvanassche@acm.org>
On 5/28/26 21:45, Bart Van Assche wrote:
> Now that all block/*.c files have been annotated, enable lock context
> analysis for all these source files.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> block/Makefile | 2 ++
> 1 file changed, 2 insertions(+)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* Re: [PATCH v2] nvme-multipath: set BIO_REMAPPED on bios remapped to per-path namespace disks
From: Hannes Reinecke @ 2026-05-29 6:45 UTC (permalink / raw)
To: Achkinazi, Igor, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
axboe@kernel.dk
Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
In-Reply-To: <DS0PR19MB76963295FC34844B413479F9FD092@DS0PR19MB7696.namprd19.prod.outlook.com>
On 5/28/26 17:24, Achkinazi, Igor wrote:
> When nvme_ns_head_submit_bio() remaps a bio from the multipath head to
> a per-path namespace, bio_set_dev() clears BIO_REMAPPED. The remapped
> bio is then resubmitted through submit_bio_noacct() which calls
> bio_check_eod() because BIO_REMAPPED is not set.
>
> This races with nvme_ns_remove() which zeroes the per-path capacity
> before synchronize_srcu():
>
> CPU 0 (IO submission)
> ---------------------
> srcu_read_lock()
> nvme_find_path() -> ns
> [NVME_NS_READY is set]
>
> CPU 1 (namespace removal)
> -------------------------
> clear_bit(NVME_NS_READY)
> set_capacity(ns->disk, 0)
> synchronize_srcu() <- blocks
>
> CPU 0 (IO submission)
> ---------------------
> bio_set_dev(bio, ns->disk->part0)
> [clears BIO_REMAPPED]
> submit_bio_noacct(bio)
> -> bio_check_eod() sees capacity=0
> -> bio fails with IO error
>
> The SRCU read lock prevents synchronize_srcu() from completing, but
> does not prevent set_capacity(0) from executing. The bio fails the
> EOD check before it reaches the NVMe driver, so nvme_failover_req()
> never gets a chance to redirect it to another path of multipath. IO errors
> are reported to the application despite another path being available.
>
> On older kernels (before commit 0b64682e78f7 "block: skip unnecessary
> checks for split bio"), the same race was also reachable through split
> remainders resubmitted via submit_bio_noacct().
>
> Observed during NVMe multipath failover testing at Dell on
> 5.14.0-570.23.1.el9_6.x86_64 (RHEL 9.7) and
> 6.4.0-150600.23.53-default (SLES 15.6).
>
> Fix this by setting BIO_REMAPPED after bio_set_dev() in
> nvme_ns_head_submit_bio(). This skips bio_check_eod() on the per-path
> device; the EOD check already passed on the multipath head.
>
> NVMe per-path namespace devices are always whole disks (bd_partno=0),
> so the blk_partition_remap() skip also gated by BIO_REMAPPED is a
> no-op. The flag does not persist across failover and cannot go stale
> if the namespace geometry changes between attempts: nvme_failover_req()
> calls bio_set_dev() to redirect the bio back to the multipath head,
> which clears BIO_REMAPPED. When nvme_requeue_work() resubmits through
> submit_bio_noacct(), bio_check_eod() runs normally against the current
> capacity.
>
> Same approach as commit 3a905c37c351 ("block: skip bio_check_eod for
> partition-remapped bios").
>
> A broader solution that moves bio validation into the queue-entered
> context and eliminates the set_capacity(0) hack is being developed
> upstream, however this minimal fix is suitable for backporting to
> stable kernels affected today. The link to the mentioned patch:
> https://lore.kernel.org/linux-block/20260519172326.3462354-1-kbusch@meta.com/
>
> Fixes: a7c7f7b2b641 ("nvme: use bio_set_dev to assign ->bi_bdev")
> Cc: stable@vger.kernel.org
> Signed-off-by: Igor Achkinazi <igor.achkinazi@dell.com>
> ---
> v2:
> - Corrected race description: primary race is in the initial
> submit_bio_noacct() call in nvme_ns_head_submit_bio(), not
> only in split remainders (which are no longer affected on
> current mainline since commit 0b64682e78f7)
> - Dropped incorrect arguments about submit_bio_noacct_nocheck
> export status and BIO_REMAPPED propagation to split clones
> - Added analysis showing BIO_REMAPPED flag does not persist
> across failover (nvme_failover_req clears it via bio_set_dev)
> - Referenced upstream RFC series addressing the root cause
>
> drivers/nvme/host/multipath.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 263161cb8ac0..04f7c7e59945 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -511,6 +511,13 @@ static void nvme_ns_head_submit_bio(struct bio *bio)
> ns = nvme_find_path(head);
> if (likely(ns)) {
> bio_set_dev(bio, ns->disk->part0);
> + /*
> + * Skip bio_check_eod() when this bio enters
> + * submit_bio_noacct() for the per-path device.
> + * The EOD check already passed on the multipath head.
> + */
> + bio_set_flag(bio, BIO_REMAPPED);
> bio->bi_opf |= REQ_NVME_MPATH;
> trace_block_bio_remap(bio, disk_devt(ns->head->disk),
> bio->bi_iter.bi_sector);
> --
> 2.43.0
>
>
> Internal Use - Confidential
>
... or you could introduce __bio_set_dev():
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 97d747320b35..5a2709adeea7 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -518,15 +518,20 @@ static inline void blkcg_punt_bio_submit(struct
bio *bio)
}
#endif /* CONFIG_BLK_CGROUP */
-static inline void bio_set_dev(struct bio *bio, struct block_device *bdev)
+static inline void __bio_set_dev(struct bio *bio, struct block_device
*bdev)
{
- bio_clear_flag(bio, BIO_REMAPPED);
if (bio->bi_bdev != bdev)
bio_clear_flag(bio, BIO_BPS_THROTTLED);
bio->bi_bdev = bdev;
bio_associate_blkg(bio);
}
+static inline void bio_set_dev(struct bio *bio, struct block_device *bdev)
+{
+ bio_clear_flag(bio, BIO_REMAPPED);
+ __bio_set_dev(bio, bdev);
+}
+
/*
* BIO list management for use by remapping drivers (e.g. DM or MD)
and loop.
*
to avoid all this clear-and-set-flag dance.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply related
* [PATCH] block: use blk_validate_byte_range() for BLKZEROOUT and BLKSECDISCARD
From: dayou5941 @ 2026-05-29 6:56 UTC (permalink / raw)
To: axboe; +Cc: linux-block, liyouhong
From: liyouhong <liyouhong@kylinos.cn>
blk_validate_byte_range() was extracted from BLKDISCARD in 2024 but
BLKZEROOUT and BLKSECDISCARD still used legacy 512-byte alignment
checks. On 4K logical block devices this allows misaligned requests
to pass ioctl validation, invalidate the page cache, and then fail in
blkdev_issue_zeroout() or blkdev_issue_secure_erase().
Use blk_validate_byte_range() for both ioctls so range validation
matches BLKDISCARD, fallocate, and the blk-lib helpers.
Signed-off-by: liyouhong <liyouhong@kylinos.cn>
---
block/ioctl.c | 29 ++++++++++-------------------
1 file changed, 10 insertions(+), 19 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index ab2c9ed79946..e2bec9acc3b7 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -176,7 +176,7 @@ static int blk_ioctl_discard(struct block_device *bdev, blk_mode_t mode,
static int blk_ioctl_secure_erase(struct block_device *bdev, blk_mode_t mode,
void __user *argp)
{
- uint64_t start, len, end;
+ uint64_t start, len;
uint64_t range[2];
int err;
@@ -189,15 +189,13 @@ static int blk_ioctl_secure_erase(struct block_device *bdev, blk_mode_t mode,
start = range[0];
len = range[1];
- if ((start & 511) || (len & 511))
- return -EINVAL;
- if (check_add_overflow(start, len, &end) ||
- end > bdev_nr_bytes(bdev))
- return -EINVAL;
+ err = blk_validate_byte_range(bdev, start, len);
+ if (err)
+ return err;
inode_lock(bdev->bd_mapping->host);
filemap_invalidate_lock(bdev->bd_mapping);
- err = truncate_bdev_range(bdev, mode, start, end - 1);
+ err = truncate_bdev_range(bdev, mode, start, start + len - 1);
if (!err)
err = blkdev_issue_secure_erase(bdev, start >> 9, len >> 9,
GFP_KERNEL);
@@ -211,7 +209,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, blk_mode_t mode,
unsigned long arg)
{
uint64_t range[2];
- uint64_t start, end, len;
+ uint64_t start, len;
int err;
if (!(mode & BLK_OPEN_WRITE))
@@ -222,21 +220,14 @@ static int blk_ioctl_zeroout(struct block_device *bdev, blk_mode_t mode,
start = range[0];
len = range[1];
- end = start + len - 1;
-
- if (start & 511)
- return -EINVAL;
- if (len & 511)
- return -EINVAL;
- if (end >= (uint64_t)bdev_nr_bytes(bdev))
- return -EINVAL;
- if (end < start)
- return -EINVAL;
+ err = blk_validate_byte_range(bdev, start, len);
+ if (err)
+ return err;
/* Invalidate the page cache, including dirty pages */
inode_lock(bdev->bd_mapping->host);
filemap_invalidate_lock(bdev->bd_mapping);
- err = truncate_bdev_range(bdev, mode, start, end);
+ err = truncate_bdev_range(bdev, mode, start, start + len - 1);
if (err)
goto fail;
--
2.25.1
^ permalink raw reply related
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Hillf Danton @ 2026-05-29 7:04 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal,
Ming Lei, linux-block, LKML, Andrew Morton, Linus Torvalds,
linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner
In-Reply-To: <26717cb6-81b0-4d5d-a5db-669283f9bb9d@I-love.SAKURA.ne.jp>
On Fri, 29 May 2026 09:14:47 +0900 Tetsuo Handa wrote:
>On 2026/05/29 8:00, Hillf Danton wrote:
>>> Given the loop workqueue that triggered the jfs warning, can you specify
>>> the reason why the workqueue in question is NOT flushed while closing disk?
>>>
>> Got it, the loop workqueue is NOT flushed to avoid deadlock, see d292dc80686a
>> ("loop: don't destroy lo->workqueue in __loop_clr_fd") for detail.
>> And the deadlock can be reproduced by flushing the loop workqueue with
>> disk->open_mutex held [1].
>>
>> [1] Subject: Re: [syzbot] possible deadlock in blkdev_put (3)
>> https://lore.kernel.org/lkml/000000000000ea753505da2658d5@google.com/
>
>We can avoid the following lockdep warnings (including [1] you mentioned)
>
> https://syzkaller.appspot.com/bug?extid=2f62807dc3239b8f584e
> https://syzkaller.appspot.com/bug?extid=c4e9d077bcc86bee08dc
> https://syzkaller.appspot.com/bug?extid=0f427123ae84b3ba6dc7
> https://syzkaller.appspot.com/bug?extid=4feabfc9641267769c97
> https://syzkaller.appspot.com/bug?extid=fb0ff9bfe34ad282ebd4
>
>caused by "drain_workqueue() with disk->open_mutex held" if we assign
>caller-specific lockdep class to disk->open_mutex
>
> https://sourceforge.net/p/tomoyo/tomoyo.git/ci/c2245c765ebeba9dcb924d9171d8d470a9ac41c8/
>
>.
>
>Also, we can avoid lockdep warning caused by "drain_workqueue() with disk->open_mutex held" +
>"holding system_transition_mutex" if we forbid binding to pseudo files as backing file
>in the loop driver
>
> https://lkml.kernel.org/r/d38e4600-3c32-491f-aa49-905f4fad1bfb@I-love.SAKURA.ne.jp
>
>which we can reproduce with
>
> echo 7:0 > /sys/power/resume
> losetup /dev/loop0 /sys/power/resume
> cat /dev/loop0 > /dev/null
> losetup -d /dev/loop0
>
>.
>
>Therefore, I think we can address this problem by "drain_workqueue() with disk->open_mutex
>held" in the loop driver side.
>
Good news.
>
>
>However, the possibility that the last milli-second writeback request
>(which runs during unmount operation) from filesystem fails due to
>
> if (data_race(READ_ONCE(lo->lo_state)) != Lo_bound)
> return BLK_STS_IOERR;
>
>check in loop_queue_rq() will remain.
This conflicts with "There is no need to destroy the workqueue when
clearing unbinding a loop device from a backing file." in d292dc80686a
>Therefore, addressing this problem
>within individual filesystem will be more strict solution. But guessing from
Conflicts with "Another thing is, if it's some btrfs bios on-the-fly after
close_ctree(), the most common symptom should be NULL pointer
dereference inside various btrfs endio functions." [2] once more.
And you need to pay the fs guys more than two cents I think for cooking
a FIX.
[2] Subject: Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
https://lore.kernel.org/lkml/36571f8a-4df8-4152-b078-d82dbff4ad7e@suse.com/
>the pace jfs fixes bugs, it would take long time before we stop seeing
>this problem...
^ permalink raw reply
* Re: [PATCH 6.12 000/272] 6.12.92-rc1 review
From: Pavel Machek @ 2026-05-29 8:27 UTC (permalink / raw)
To: Miguel Ojeda
Cc: gregkh, achill, akpm, broonie, conor, f.fainelli, hargar,
jonathanh, linux-kernel, linux, lkft-triage, patches, patches,
pavel, rwarsow, shuah, sr, stable, sudipm.mukherjee, torvalds,
Anuj Gupta, Kanchan Joshi, Christoph Hellwig, Keith Busch,
Jens Axboe, linux-block
In-Reply-To: <20260529060918.123155-1-ojeda@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 1134 bytes --]
Hi!
> > This is the start of the stable review cycle for the 6.12.92 release.
> > There are 272 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sat, 30 May 2026 19:45:52 +0000.
> > Anything received after that time might be too late.
>
> Boot-tested under QEMU for Rust x86_64, arm64 and riscv64; built-tested
> for loongarch64:
>
> Tested-by: Miguel Ojeda <ojeda@kernel.org>
>
> I am seeing:
>
> In file included from kernel/trace/blktrace.c:23:
> In file included from kernel/trace/../../block/blk.h:5:
> ./include/linux/bio-integrity.h:101:12: error: unused function 'bio_integrity_map_user' [-Werror,-Wunused-function]
> 101 | static int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter)
> | ^~~~~~~~~~~~~~~~~~~~~~
>
We see that, too:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/jobs/14592368004
We don't see the problem on 6.6, 6.18 or 7.0-stable.
Best regards,
Pavel
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply
* Re: [PATCH v6 1/4] block: add task-context bio completion infrastructure
From: Sebastian Andrzej Siewior @ 2026-05-29 8:49 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Tal Zussman, Jens Axboe, Matthew Wilcox (Oracle),
Christian Brauner, Darrick J. Wong, Carlos Maiolino,
Alexander Viro, Jan Kara, Dave Chinner, Bart Van Assche,
linux-block, linux-kernel, linux-xfs, linux-fsdevel, linux-mm,
Gao Xiang, Clark Williams, Steven Rostedt, linux-rt-devel
In-Reply-To: <ahPdDtu3vXfNpb__@infradead.org>
On 2026-05-24 22:24:30 [-0700], Christoph Hellwig wrote:
> > > + local_lock_irqsave(&bio_complete_batch.lock, flags);
> >
> > Q: "Is it safe to use local_lock_irqsave() here when called from an atomic
> > context?
> > On CONFIG_PREEMPT_RT kernels, local_lock_t maps to a per-CPU spinlock_t,
> > which is a sleepable rt_mutex. Since __bio_complete_in_task() is specifically
> > called when bio_in_atomic() is true (which includes hardware interrupts or
> > execution under a raw_spinlock_t), attempting to acquire a sleepable lock
> > here would trigger an "Invalid wait context" lockdep warning.
> > Would a lockless list (llist) be more appropriate here to avoid sleeping
> > in atomic contexts?"
> >
> > A: This seems legit, but I'm not super familiar with PREEMPT_RT. I don't want
> > to switch to raw_spinlock_t, as it seems like that would add unnecessary
> > overhead on non-PREEMPT_RT kernels. I think switching to use local_irq_save()
> > (as is done for the per-CPU bio allocation cache) should work.
>
> Adding the PREEMPT_RT maintainers for this as it is above my pay grade.
The local_lock_irqsave() seems to come from __bio_complete_in_task()
which sounds like preemptible context in general. So yes, using so is
safe.
It should be even save with interrupt handlers and so on since they are
threaded in general.
There is this new bio_in_atomic() this looks a bit odd to detect softirq
context as calimed in the comment. Anyway, on PREEMPT_RT
rcu_preempt_depth() will work as intended. The preemptible() shouldn't
get false because softirq handling is not recorded in preempt-counter,
interrupts are forced-threaded so you should never be in hardirq
context and things like spin_lock_irq() don't disable interrupts. So
unless you do local_irq_save(), preempt_disable() you should remain
preemptible (even in softirq). This might or might not do what you want.
> > Q: "Does creating this workqueue with WQ_MEM_RECLAIM break the local_lock_irq()
> > protection in bio_complete_work_fn()?
> > When a workqueue has WQ_MEM_RECLAIM, it spawns a global rescuer thread that
> > can execute per-CPU work items during worker pool congestion. This rescuer
> > thread executes unbound, meaning it could run on CPU B while processing
> > CPU A's work item.
> > Since local_lock operates strictly on the currently executing CPU, the
> > rescuer thread on CPU B would acquire CPU B's lock, while popping elements
> > from CPU A's list (derived via container_of()).
> > If an interrupt on CPU A concurrently calls __bio_complete_in_task(),
> > it will acquire CPU A's lock and modify the same list without mutual
> > exclusion, potentially causing list corruption."
> >
> > A: The rescuer should run on the same CPU, not unbound, so this is not an
> > issue.
>
> This is another area where the PREEMPT_RT/scheduler folks might be able
> to help.
Not sure what the question is. WQ_MEM_RECLAIM is one thing WQ_UNBOUND/
WQ_PERCPU another.
bio_complete_wq is WQ_MEM_RECLAIM | WQ_PERCPU. So it will run on the
requested CPU. The container_of() and local_local() looks like it will
access the same thing but having a WARN_ON() if they don't would be a
blessing. Or just use this_cpu in the worker FN to avoid all this.
The need_resched() check in bio_complete_work_fn() is bit odd. So if
need_resched() is true then you want to leave (and you care !PREEMPT
kernels). On PREEMPT_LAZY you could just continue as there would be
preemption sooner or later.
The bio_list_empty() check below is futile. If it is empty then you
leave doing nothing (so you could just leave without the check).
If there is an item, then the enqueue "thread" should have invoked
mod_delayed_work_on(, 1) claiming the work. That means the
mod_delayed_work_on(, 0) in this function should do nothing because the
work is already claimed (so you could just leave skipping the extra
work).
Sebastian
^ permalink raw reply
* [PATCH 6.12.y] block: make bio_integrity_map_user() static inline
From: Ruslan Valiyev @ 2026-05-29 8:53 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, Sasha Levin, Jens Axboe, Kanchan Joshi,
Anuj Gupta, Keith Busch, linux-block, linux-kernel,
kernel test robot, kernelci . org bot, Ruslan Valiyev
From: Jens Axboe <axboe@kernel.dk>
If CONFIG_BLK_DEV_INTEGRITY isn't set, then the dummy helper must be
static inline to avoid complaints about the function being unused.
Fixes: fe8f4ca7107e ("block: modify bio_integrity_map_user to accept iov_iter as argument")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202411300229.y7h60mDg-lkp@intel.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 546d191427cf5cf3215529744c2ea8558f0279db)
The same build break is now reported on stable-rc/linux-6.12.y by KernelCI.
Verified the upstream cherry-pick applies cleanly on top of 97928cc88900a
(Linux 6.12.92-rc1) and that block/bdev.o + block/fops.o compile cleanly
afterwards with i386_defconfig (which leaves CONFIG_BLK_DEV_INTEGRITY
unset). Without the fix both files trip:
include/linux/bio-integrity.h:101:12: error: 'bio_integrity_map_user' defined but not used [-Werror=unused-function]
Reported-by: kernelci.org bot <bot@kernelci.org>
Closes: https://lore.kernel.org/all/178000554419.7114.5687032601791586484@330cfa3079ca/
Closes: https://lore.kernel.org/all/178000194591.7095.11275948264529325340@330cfa3079ca/
Signed-off-by: Ruslan Valiyev <linuxoid@gmail.com>
---
include/linux/bio-integrity.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/bio-integrity.h b/include/linux/bio-integrity.h
index be91479b2c42d..53f6dbd2816e0 100644
--- a/include/linux/bio-integrity.h
+++ b/include/linux/bio-integrity.h
@@ -98,7 +98,7 @@ static inline void bioset_integrity_free(struct bio_set *bs)
{
}
-static int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter)
+static inline int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter)
{
return -EINVAL;
}
base-commit: 97928cc88900a9fb07a4dddbd1db19eb0ce55c56
--
2.43.0
^ permalink raw reply related
* Re: [PATCH 6.12 000/272] 6.12.92-rc1 review
From: Peter Schneider @ 2026-05-29 10:59 UTC (permalink / raw)
To: Miguel Ojeda, gregkh
Cc: achill, akpm, broonie, conor, f.fainelli, hargar, jonathanh,
linux-kernel, linux, lkft-triage, patches, patches, pavel,
rwarsow, shuah, sr, stable, sudipm.mukherjee, torvalds,
Anuj Gupta, Kanchan Joshi, Christoph Hellwig, Keith Busch,
Jens Axboe, linux-block
In-Reply-To: <20260529060918.123155-1-ojeda@kernel.org>
Am 29.05.2026 um 08:09 schrieb Miguel Ojeda:
> On Thu, 28 May 2026 21:46:14 +0200 Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
>>
>> This is the start of the stable review cycle for the 6.12.92 release.
>> There are 272 patches in this series, all will be posted as a response
>> to this one. If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Sat, 30 May 2026 19:45:52 +0000.
>> Anything received after that time might be too late.
>
> Boot-tested under QEMU for Rust x86_64, arm64 and riscv64; built-tested
> for loongarch64:
>
> Tested-by: Miguel Ojeda <ojeda@kernel.org>
>
> I am seeing:
>
> In file included from kernel/trace/blktrace.c:23:
> In file included from kernel/trace/../../block/blk.h:5:
> ./include/linux/bio-integrity.h:101:12: error: unused function 'bio_integrity_map_user' [-Werror,-Wunused-function]
> 101 | static int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter)
> | ^~~~~~~~~~~~~~~~~~~~~~
>
> This looks like it needs:
>
> 546d191427cf ("block: make bio_integrity_map_user() static inline")
>
> (and indeed in my run `CONFIG_BLK_DEV_INTEGRITY` is not set like the
> commit message says).
I didn't see this error message on x86_64; I built with CONFIG_WERROR=y and CONFIG_BLK_DEV_INTEGRITY=y
Builds, boots and works on my 2-socket Ivy Bridge Xeon E5-2697 v2 server. No dmesg oddities or regressions found.
Tested-by: Peter Schneider <pschneider1968@googlemail.com>
Beste Grüße,
Peter Schneider
--
Climb the mountain not to plant your flag, but to embrace the challenge,
enjoy the air and behold the view. Climb it so you can see the world,
not so the world can see you. -- David McCullough Jr.
OpenPGP: 0xA3828BD796CCE11A8CADE8866E3A92C92C3FF244
Download: https://www.peters-netzplatz.de/download/pschneider1968_pub.asc
https://keys.mailvelope.com/pks/lookup?op=get&search=pschneider1968@googlemail.com
https://keys.mailvelope.com/pks/lookup?op=get&search=pschneider1968@gmail.com
^ permalink raw reply
* Re: [PATCH 6.12 000/272] 6.12.92-rc1 review
From: Sasha Levin @ 2026-05-29 12:44 UTC (permalink / raw)
To: Miguel Ojeda
Cc: Sasha Levin, gregkh, achill, akpm, broonie, conor, f.fainelli,
hargar, jonathanh, linux-kernel, linux, lkft-triage, patches,
patches, pavel, rwarsow, shuah, sr, stable, sudipm.mukherjee,
torvalds, Anuj Gupta, Kanchan Joshi, Christoph Hellwig,
Keith Busch, Jens Axboe, linux-block
In-Reply-To: <ahlN6TPTgMwBT9_d@duo.ucw.cz>
On Fri, May 29, 2026 at 10:27:21AM +0200, Pavel Machek wrote:
> > I am seeing:
> >
> > ./include/linux/bio-integrity.h:101:12: error: unused function 'bio_integrity_map_user' [-Werror,-Wunused-function]
> >
> > This looks like it needs:
> >
> > 546d191427cf ("block: make bio_integrity_map_user() static inline")
> >
> We see that, too:
> https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/jobs/14592368004
> We don't see the problem on 6.6, 6.18 or 7.0-stable.
Thanks! I've queued up 546d191427cf ("block: make bio_integrity_map_user()
static inline").
--
Thanks,
Sasha
^ permalink raw reply
* Re: [PATCH v7 17/43] btrfs: add get_devices hook for fscrypt
From: Daniel Vacek @ 2026-05-29 14:51 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Chris Mason, Josef Bacik, Eric Biggers, Theodore Y. Ts'o,
Jaegeuk Kim, Jens Axboe, David Sterba, linux-block, linux-fscrypt,
linux-btrfs, linux-kernel, Sweet Tea Dorminy
In-Reply-To: <ahBJdSKMly8rv04F@infradead.org>
On Fri, 22 May 2026 at 14:17, Christoph Hellwig <hch@infradead.org> wrote:
> On Fri, May 22, 2026 at 02:00:28PM +0200, Daniel Vacek wrote:
> > > How does this handled adding/removing devices at runtime?
> >
> > When called, this callback returns the list of bdevs opened by the
> > given superblock. If devices are added or removed, this function
> > returns a different list.
> > In other words it always returns a valid list.
> >
> > This is called from `fscrypt_get_devices()`, which is called from
> > `fscrypt_select_encryption_impl()` or
> > `fscrypt_prepare_inline_crypt_key()` or
> > `fscrypt_destroy_inline_crypt_key()`. All these functions walk the
> > returned list and discard it immediately afterwards.
> >
> > Note that with btrfs at this point we're only using the inline crypto fallback.
> > Is there any particular reason you asked this question?
>
> Well, assume you have a single device fs, and then you add a device
> later, you will not get the blk_crypto_config_supported call for this
> device, and it will not be taken into account.
This function is called from `fscrypt_prepare_new_inode()` from
`btrfs_new_inode_prepare()` as well as from many other places.
It looks quite OK to me and I can also confirm this with tracing.
Using the following bpftrace script:
```
fr:fscrypt_get_devices {
// $num_devs = args.num_devs[0];
$num_devs = ((uint32 *)args.num_devs)[0];
// if ($num_devs < 2) { return; }
printf("%s()\t\t\t(%4d %13s[%d])\tnum_devs %d\n", func,
cpu, curtask->comm, curtask->pid, $num_devs);
}
f:blk_crypto_config_supported {
printf("%s()\t\t(%4d %13s[%d])\tbdev %18p\n", func,
cpu, curtask->comm, curtask->pid, args.bdev);
}
```
... and mounting an encrypted FS, then adding an additional device, like this:
```
$ mount /dev/vdb /mnt/scratch; \
echo -ne $TEST_RAW_KEY | xfs_io -c add_enckey /mnt/scratch; \
touch /mnt/scratch/dir/foo; \
btrfs device add /dev/vdc /mnt/scratch; \
touch /mnt/scratch/dir/bar
```
I'm getting this:
```
fscrypt_get_devices() ( 5 touch[26840]) num_devs 1
blk_crypto_config_supported() ( 5 touch[26840]) bdev
0xffff88a9c33fc880
fscrypt_get_devices() ( 5 touch[26840]) num_devs 1
fscrypt_get_devices() ( 5 touch[26844]) num_devs 2
blk_crypto_config_supported() ( 5 touch[26844]) bdev
0xffff88a9c3262b80
blk_crypto_config_supported() ( 5 touch[26844]) bdev
0xffff88a9c33fc880
fscrypt_get_devices() ( 5 touch[26844]) num_devs 2
```
Here you can see the newly added device is being considered.
Moreover btrfs only supports the fallback encryption due to the need
to compute the checksums of encrypted data stored on the device.
> Now can btrfs even support hardware inline encryption? The way the bio
> processing is special cased I somehow doubt it. But the concept of a
> static device list just doesn't work for btrfs, so I think the fscrypt
> side of this will need refactoring not to rely on it. If we never
> support hardware inline encryption on such dynamic file systems that
> would be relative easy, if we need to support that case things might
> get a lot more complicated.
Yeah, this depends. If the device or fscrypt could return the checksum
to the FS, btrfs could use the inline HW encryption. Note that the
checksum must also be one that btrfs supports.
Otherwise we need to get the encrypted data to compute the checksum
ourselves. That is precisely why only fallback encryption is currently
supported. And it's where the FS callback hook is used to compute the
checksum.
--nX
^ permalink raw reply
* [REPORT] nvmet-rdma: integer overflow in inline-data SGL bounds check -> pre-auth kernel-memory read + remote crash (candidate patch inline)
From: hexlabsecurity @ 2026-05-29 6:52 UTC (permalink / raw)
To: security@kernel.org
Cc: hch@lst.de, sagi@grimberg.me, kbusch@kernel.org, kch@nvidia.com,
linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
linux-block@vger.kernel.org
Hello,
I would like to report an integer-overflow vulnerability in the NVMe-oF
RDMA target (drivers/nvme/target/rdma.c). The inline-data SGL bounds
check in nvmet_rdma_map_sgl_inline() is computed in u64 over two
host-controlled values and wraps, which a remote fabric peer can use
both to read kernel memory back over the fabric and to crash the target.
== Affected ==
drivers/nvme/target/rdma.c, nvmet_rdma_map_sgl_inline()
Verified present on the current mainline tree (commit 27fa82620cba,
~v7.1-rc5), at the bounds check:
static u16 nvmet_rdma_map_sgl_inline(struct nvmet_rdma_rsp *rsp)
{
struct nvme_sgl_desc *sgl = &rsp->req.cmd->common.dptr.sgl;
u64 off = le64_to_cpu(sgl->addr); /* host-controlled, 64-bit */
u32 len = le32_to_cpu(sgl->length); /* host-controlled, 32-bit */
...
if (off + len > rsp->queue->dev->inline_data_size) { /* u64 wrap */
pr_err("invalid inline data offset!\n");
return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
}
...
nvmet_rdma_use_inline_sg(rsp, len, off);
}
"off + len" is evaluated in u64 and wraps modulo 2^64. For example
addr = 0xfffffffffffffe00, length = 0x1000 makes the sum wrap to
0xe00, which is <= inline_data_size (default PAGE_SIZE), so the check
passes. The current check form (against the per-port inline_data_size)
and the fixed-size inline_sg[NVMET_RDMA_MAX_INLINE_SGE] array with the
num_pages(len) loop were introduced together by commit 0d5ee2b2ab4f
("nvmet-rdma: support max(16KB, PAGE_SIZE) inline data"), which is the
Fixes: I used. Note: the single-page inline path that predates that
commit may have an analogous u64-overflow read in a different code
shape; I would appreciate the maintainers' judgement on whether the
stable backport scope should reach before that commit.
== Two consequences of the bypass ==
1. Kernel-memory read (information disclosure).
nvmet_rdma_use_inline_sg() does "sg->offset = off", truncating the
64-bit offset to scatterlist::offset (unsigned int). The block
layer then accesses page_to_phys(inline_page) + (off & 0xffffffff),
so the target reads up to inline_data_size bytes of kernel memory
per write command and returns them to the host on read-back, or
faults the in-kernel copy if the offset lands on unmapped memory.
2. Kernel-memory corruption -> remote crash (denial of service).
A large length makes "sg_count = num_pages(len)" in
nvmet_rdma_use_inline_sg() exceed NVMET_RDMA_MAX_INLINE_SGE (4), so
the loop writes scatterlist entries past the fixed-size inline_sg[]
array, corrupting the surrounding command object.
== Reachability ==
The path is reached by any write command carrying an inline SGL, i.e.
after a Fabrics Connect. On a subsystem configured with
attr_allow_any_host=1 it is reachable WITHOUT authentication by any
RDMA peer (RoCE/iWARP/IB) that can reach the target's listener. With
DH-CHAP configured, or attr_allow_any_host=0 with an unknown host NQN,
a valid/known host NQN is required first.
== Empirical reproduction ==
Reproduced against a stock nvmet-rdma target over a soft-iWARP (siw)
fabric on a Linux 6.12.90 build with KASAN (KASAN_INLINE):
- Read: a single write command with addr = 0xfffffffffffffe00,
length = 0x1000 produced a KASAN out-of-bounds read and returned
~4 KiB of kernel memory (including kernel .text) into the
attacker-readable namespace.
- Crash: a write command with addr = 0xffffffffffff0500,
length = 0x10000 (sum wraps to 0x500 <= inline_data_size, but
num_pages(0x10000) = 16 writes 16 scatterlist entries into the
4-entry inline_sg[], 12 past its end) deterministically corrupted
the command object and oopsed the target:
Oops: general protection fault [...] KASAN: null-ptr-deref
RIP: nvmet_rdma_post_recv+0x... [nvmet_rdma]
nvmet_rdma_post_recv <- nvmet_rdma_queue_response
<- __nvmet_req_complete <- nvmet_check_transfer_len
<- nvmet_rdma_handle_command <- ib_cq_poll_work
Every reconnect re-triggers it (persistent remote DoS). The
nvmet_rdma_cmd objects are carved from one contiguous kcalloc'd
array, so the over-long entry write stays within that allocation and
KASAN flags the downstream dereference of the corrupted command in
nvmet_rdma_post_recv rather than the store itself. The out-of-bounds
content is not attacker-controlled, so this is a crash/corruption
primitive, not a controlled write; I do not see a path to remote code
execution from this bug.
Severity estimate. The two consequences arise from different inline-SGL
capsules (small vs large length) and are scored as separate single-capsule
outcomes, not one combined vector:
OOB read (info-disclosure): CVSS 7.5 HIGH
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N
OOB write (corruption/DoS): CVSS 8.2 HIGH
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:H
Headline 8.2 HIGH (both reachable pre-auth with attr_allow_any_host=1).
With attr_allow_any_host=0 a valid host NQN is required first (PR:L),
lowering these to 6.5 and 7.1.
== Suggested fix ==
Validate the offset with check_add_overflow() before comparing against
inline_data_size. A passing check then guarantees
off + len <= inline_data_size <= NVMET_RDMA_MAX_INLINE_DATA_SIZE, which
bounds both the truncated scatterlist::offset and
num_pages(len) <= NVMET_RDMA_MAX_INLINE_SGE, closing the read and the
inline_sg[] overflow together. Candidate patch inline below (applies
to current mainline).
== Embargo ==
I am happy to follow the standard process. Proposing a 7-day embargo;
the fix is small and I can adjust as the maintainers prefer. I have
not notified linux-distros and will hold that until a public patch
lands, per the usual guidance.
I am an independent security researcher; please credit
"Bryam Vargas <hexlabsecurity@proton.me>" (Reported-by already in the
patch). Affiliation: HEXLAB SAS (registration pending) -- Cali,
Colombia. Happy to provide the full reproduction harness on request.
Thank you,
Bryam Vargas
----- candidate patch (inline, plain text) -----
From 448c122c744430c1c2926d635855a3894370ee33 Mon Sep 17 00:00:00 2001
From: Bryam Vargas <hexlabsecurity@proton.me>
Date: Thu, 28 May 2026 21:23:52 -0500
Subject: [PATCH] nvmet-rdma: fix integer overflow in inline data SGL bounds
check
nvmet_rdma_map_sgl_inline() bounds-checks the inline data descriptor
with both operands host-controlled and the sum evaluated in u64:
u64 off = le64_to_cpu(sgl->addr);
u32 len = le32_to_cpu(sgl->length);
...
if (off + len > rsp->queue->dev->inline_data_size)
return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
"off + len" therefore wraps modulo 2^64. A descriptor with, for
example, addr = 0xfffffffffffffe00 and length = 0x1000 makes the sum
wrap to 0xe00, which passes the inline_data_size check. An inline-SGL
write command reaches this path after a Fabrics Connect; on a subsystem
with attr_allow_any_host set it is reachable without authentication by
any peer that can reach the target.
Two distinct out-of-bounds accesses follow from the bypass:
- nvmet_rdma_use_inline_sg() stores the 64-bit offset into
scatterlist::offset, which is unsigned int, committing the truncated
attacker offset to the inline page. The block layer then accesses
page_to_phys(inline_page) + (off & 0xffffffff), reading up to
inline_data_size bytes of kernel memory per command back to the host
(or faulting the target if the offset lands on unmapped memory).
- A large len makes sg_count = num_pages(len) in
nvmet_rdma_use_inline_sg() exceed NVMET_RDMA_MAX_INLINE_SGE, so the
loop writes scatterlist entries past the fixed-size inline_sg[]
array, corrupting the surrounding command object and oopsing the
target on the next use of that command.
Validate the offset with check_add_overflow() before comparing against
inline_data_size. A passing check then guarantees
off + len <= inline_data_size <= NVMET_RDMA_MAX_INLINE_DATA_SIZE, which
bounds both the truncated scatterlist::offset and
num_pages(len) <= NVMET_RDMA_MAX_INLINE_SGE, closing the out-of-bounds
read and the inline_sg[] overflow together.
Reported-by: Bryam Vargas <hexlabsecurity@proton.me>
Fixes: 0d5ee2b2ab4f ("nvmet-rdma: support max(16KB, PAGE_SIZE) inline data")
Cc: stable@vger.kernel.org
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
Review context (not for the commit log):
Reproducer -- unprivileged remote RDMA peer against a target with
attr_allow_any_host=1, a single inline-SGL WRITE capsule:
* OOB read: sgl->addr=0xfffffffffffffe00, sgl->length=0x1000
(off+len wraps to 0xe00 <= inline_data_size; sg->offset
truncates to 0xfffffe00) -> ~4 KiB of kernel memory is
read back from the namespace.
* OOB write: sgl->addr=0xffffffffffff0500, sgl->length=0x10000
(num_pages(0x10000)=16 overruns the 4-entry inline_sg[])
-> target memory corruption / crash.
A/B-tested on a 6.12.90 KASAN lab kernel (same .config, only this hunk
differs): pre-fix the OOB-read capsule trips "KASAN: use-after-free in
copy_page_from_iter_atomic" via nvmet_file_execute_io; post-fix both
capsules are rejected with "invalid inline data offset!"
(NVME_SC_SGL_INVALID_OFFSET), benign inline writes still succeed, and no
KASAN/oops fires. The fix decides identically in 32- and 64-bit builds
(check_add_overflow operates on u64).
drivers/nvme/target/rdma.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index e6e2c3f9afdf..a5bbf9d41c3b 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -12,6 +12,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/nvme.h>
+#include <linux/overflow.h>
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/wait.h>
@@ -847,6 +848,7 @@ static u16 nvmet_rdma_map_sgl_inline(struct nvmet_rdma_rsp *rsp)
struct nvme_sgl_desc *sgl = &rsp->req.cmd->common.dptr.sgl;
u64 off = le64_to_cpu(sgl->addr);
u32 len = le32_to_cpu(sgl->length);
+ u64 bound;
if (!nvme_is_write(rsp->req.cmd)) {
rsp->req.error_loc =
@@ -854,7 +856,8 @@ static u16 nvmet_rdma_map_sgl_inline(struct nvmet_rdma_rsp *rsp)
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
}
- if (off + len > rsp->queue->dev->inline_data_size) {
+ if (check_add_overflow(off, (u64)len, &bound) ||
+ bound > rsp->queue->dev->inline_data_size) {
pr_err("invalid inline data offset!\n");
return NVME_SC_SGL_INVALID_OFFSET | NVME_STATUS_DNR;
}
--
2.43.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox