* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Hillf Danton @ 2026-05-28 5:43 UTC (permalink / raw)
To: Ming Lei
Cc: Tetsuo Handa, Jens Axboe, Bart Van Assche, Christoph Hellwig,
Damien Le Moal, linux-block, LKML, Andrew Morton, Linus Torvalds,
linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner
In-Reply-To: <ahZeYQ0cLE1i8TGs@fedora>
On Tue, 26 May 2026 22:00:49 -0500 Ming Lei wrote:
>On Wed, May 27, 2026 at 10:35:56AM +0900, Tetsuo Handa wrote:
>> On 2026/05/27 10:20, Ming Lei wrote:
>> >> Of course we should try to figure out the root cause first, but how can we do?
>> >
>> > Definitely unexpected write IO(after umount & loop closed) from btrfs is more serious,
>> > which may cause data loss, so CC btrfs list and maintainer.
>>
>> Why do you assume that the culprit is btrfs?
>>
>> https://syzkaller.appspot.com/bug?extid=bc273027d5643e48e5b3 indicated that
>> this similar race is also happening with jfs.
>
> I just didn't see the above report on jfs.
>
> It doesn't change anything, the same question still stands: unexpected write IO is issued
> or crosses umount & last closing of loop disk.
>
Given the loop workqueue that triggered the jfs warning, can you specify
the reason why the workqueue in question is NOT flushed while closing disk?
^ permalink raw reply
* Re: [PATCH] block: mark biovec_init_pool static
From: Hannes Reinecke @ 2026-05-28 6:05 UTC (permalink / raw)
To: Christoph Hellwig, axboe; +Cc: linux-block
In-Reply-To: <20260527150646.2349405-1-hch@lst.de>
On 5/27/26 17:06, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/bio.c | 2 +-
> include/linux/bio.h | 1 -
> 2 files changed, 1 insertion(+), 2 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* Re: [PATCH] block: add a bio_endio_status helper
From: Hannes Reinecke @ 2026-05-28 6:56 UTC (permalink / raw)
To: Christoph Hellwig, axboe; +Cc: linux-block
In-Reply-To: <20260527151247.2352145-1-hch@lst.de>
On 5/27/26 17:12, Christoph Hellwig wrote:
> Add a helper that sets bi_status and call bio_endio() as that is a very
> common pattern and convert the core block code over to it.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/blk-core.c | 11 ++++-------
> block/blk-crypto-fallback.c | 9 +++------
> block/blk-crypto.c | 3 +--
> block/blk-merge.c | 6 ++----
> block/blk-mq.c | 6 ++----
> block/fops.c | 3 +--
> include/linux/bio.h | 19 +++++++++++++++----
> 7 files changed, 28 insertions(+), 29 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply
* [linux-next:master] [block] 73cf422c6a: EIP:look_up_lock_class
From: kernel test robot @ 2026-05-28 7:05 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: oe-lkp, lkp, linux-block, oliver.sang
Hello,
kernel test robot noticed "EIP:look_up_lock_class" on:
commit: 73cf422c6afb067b7e5d395dae8c44e076a00ebb ("block: assign caller-specific lockdep class to disk->open_mutex")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
[test failed on linux-next/master e7d700e14934e68f86338c5610cf2ae76798b663]
in testcase: trinity
version:
with following parameters:
runtime: 300s
group: group-01
nr_groups: 5
config: i386-randconfig-007-20260527
compiler: clang-20
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 32G
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202605281444.29e7f79b-lkp@intel.com
[ 15.148854][ T192] ------------[ cut here ]------------
[ 15.150937][ T192] Looking for class "&cd->lock" with key sr_probe.__key, but found a different class "&disk->open_mutex" with the same key
[ 15.154899][ T192] WARNING: kernel/locking/lockdep.c:944 at look_up_lock_class+0xfe/0x140, CPU#0: udevd/192
[ 15.157990][ T192] Modules linked in:
[ 15.159228][ T192] CPU: 0 UID: 0 PID: 192 Comm: udevd Not tainted 7.1.0-rc5-00007-g73cf422c6afb #1 PREEMPT(lazy) 9df36ae0f1aead0a85dee313de9d5bfe012b6acd
[ 15.163724][ T192] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 15.166935][ T192] EIP: look_up_lock_class (locking/lockdep.c:941)
[ 15.168594][ T192] Code: f8 64 fd 83 c4 08 0f 0b eb cd 80 3d e5 95 8f 45 00 75 8f c6 05 e5 95 8f 45 01 51 52 50 68 c7 4b f8 44 e8 c5 f8 64 fd 83 c4 10 <0f> 0b e9 71 ff ff ff b8 90 88 25 45 89 f2 e8 af 10 35 fe 83 3d 18
All code
========
0: f8 clc
1: 64 fd fs std
3: 83 c4 08 add $0x8,%esp
6: 0f 0b ud2
8: eb cd jmp 0xffffffffffffffd7
a: 80 3d e5 95 8f 45 00 cmpb $0x0,0x458f95e5(%rip) # 0x458f95f6
11: 75 8f jne 0xffffffffffffffa2
13: c6 05 e5 95 8f 45 01 movb $0x1,0x458f95e5(%rip) # 0x458f95ff
1a: 51 push %rcx
1b: 52 push %rdx
1c: 50 push %rax
1d: 68 c7 4b f8 44 push $0x44f84bc7
22: e8 c5 f8 64 fd call 0xfffffffffd64f8ec
27: 83 c4 10 add $0x10,%esp
2a:* 0f 0b ud2 <-- trapping instruction
2c: e9 71 ff ff ff jmp 0xffffffffffffffa2
31: b8 90 88 25 45 mov $0x45258890,%eax
36: 89 f2 mov %esi,%edx
38: e8 af 10 35 fe call 0xfffffffffe3510ec
3d: 83 .byte 0x83
3e: 3d .byte 0x3d
3f: 18 .byte 0x18
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: e9 71 ff ff ff jmp 0xffffffffffffff78
7: b8 90 88 25 45 mov $0x45258890,%eax
c: 89 f2 mov %esi,%edx
e: e8 af 10 35 fe call 0xfffffffffe3510c2
13: 83 .byte 0x83
14: 3d .byte 0x3d
15: 18 .byte 0x18
[ 15.174645][ T192] EAX: 00000000 EBX: 465c09fc ECX: 00000000 EDX: 00000000
[ 15.176847][ T192] ESI: 45f5dacc EDI: bf79d6b0 EBP: 494d1ae0 ESP: 494d1ad4
[ 15.179052][ T192] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010006
[ 15.181672][ T192] CR0: 80050033 CR2: 3f831928 CR3: 7e675000 CR4: 000406d0
[ 15.183858][ T192] Call Trace:
[ 15.184927][ T192] ? register_lock_class (locking/lockdep.c:1296)
[ 15.186594][ T192] ? __lock_acquire (locking/lockdep.c:5114)
[ 15.188124][ T192] ? blk_mq_free_request (blk-mq.c:830)
[ 15.189788][ T192] ? scsi_test_unit_ready (scsi/scsi_lib.c:2489)
[ 15.191460][ T192] ? sr_revalidate_disk (scsi/sr.c:484)
[ 15.193069][ T192] ? lock_acquire (locking/lockdep.c:5870)
[ 15.194534][ T192] ? sr_block_open (scsi/sr.c:511)
[ 15.195998][ T192] ? __mutex_lock_common (locking/mutex.c:646)
[ 15.197692][ T192] ? sr_block_open (scsi/sr.c:511)
[ 15.199148][ T192] ? disk_check_media_change (linux/spinlock.h:402 disk-events.c:259 disk-events.c:276)
[ 15.201094][ T192] ? lockdep_hardirqs_on_prepare (locking/lockdep.c:4327 locking/lockdep.c:4412)
[ 15.202969][ T192] ? mutex_lock_nested (locking/mutex.c:820 locking/mutex.c:873)
[ 15.204515][ T192] ? sr_block_open (scsi/sr.c:511)
[ 15.205963][ T192] ? sr_block_open (scsi/sr.c:511)
[ 15.207402][ T192] ? blkdev_get_whole (bdev.c:738)
[ 15.208937][ T192] ? bdev_open (bdev.c:965)
[ 15.210863][ T192] ? blkdev_open (fops.c:697)
[ 15.212973][ T192] ? blkdev_write_iter (fops.c:809)
[ 15.215448][ T192] ? do_dentry_open (open.c:947)
[ 15.217726][ T192] ? inode_permission (namei.c:656)
[ 15.220026][ T192] ? vfs_open (open.c:1079)
[ 15.222005][ T192] ? path_openat (namei.c:4699)
[ 15.224177][ T192] ? do_file_open (namei.c:4887)
[ 15.226381][ T192] ? do_sys_openat2 (open.c:1364)
[ 15.228648][ T192] ? __ia32_sys_openat (open.c:1370 open.c:1386 open.c:1381 open.c:1381)
[ 15.231073][ T192] ? ia32_sys_call (kbuild/obj/consumer/i386-randconfig-007-20260527/./arch/x86/include/generated/asm/syscalls_32.h:296)
[ 15.233322][ T192] ? do_int80_syscall_32 (x86/entry/syscall_32.c:83)
[ 15.235698][ T192] ? _copy_to_user (x86/include/asm/uaccess_32.h:20 linux/uaccess.h:206 usercopy.c:26)
[ 15.237840][ T192] ? do_int80_syscall_32 (linux/entry-common.h:317)
[ 15.240356][ T192] ? rcu_is_watching (rcu/tree.c:753)
[ 15.242605][ T192] ? do_int80_syscall_32 (linux/randomize_kstack.h:57)
[ 15.245000][ T192] ? entry_INT80_32 (x86/entry/entry_32.S:940)
[ 15.247150][ T192] ? exc_page_fault (x86/mm/fault.c:1530)
[ 15.249320][ T192] ? entry_INT80_32 (x86/entry/entry_32.S:940)
[ 15.251034][ T192] irq event stamp: 4795
[ 15.252378][ T192] hardirqs last enabled at (4795): _raw_spin_unlock_irqrestore (linux/spinlock_api_smp.h:178 locking/spinlock.c:198)
[ 15.255395][ T192] hardirqs last disabled at (4794): _raw_spin_lock_irqsave (linux/spinlock_api_smp.h:130 locking/spinlock.c:166)
[ 15.258260][ T192] softirqs last enabled at (4654): local_bh_enable (linux/bottom_half.h:32)
[ 15.260941][ T192] softirqs last disabled at (4652): local_bh_disable (linux/bottom_half.h:19)
[ 15.263626][ T192] ---[ end trace 0000000000000000 ]---
[ 15.265332][ T192]
[ 15.266134][ T192] ============================================
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260528/202605281444.29e7f79b-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply
* [PATCH v2 1/1] rust: block: fix GenDisk cleanup paths
From: Ren Wei @ 2026-05-28 7:28 UTC (permalink / raw)
To: linux-block, rust-for-linux
Cc: ojeda, boqun, gary, bjorn3_gh, lossin, a.hindborg, aliceryhl,
tmgross, dakr, daniel.almeida, axboe, tamird, sunke, yuantan098,
bird, royenheart, n05ec
From: Haoze Xie <royenheart@gmail.com>
GenDiskBuilder::build() still has fallible work after
__blk_mq_alloc_disk(), but its error path only recovers the
foreign queue data. That leaks the temporary gendisk and
request_queue until later teardown. If the caller moved the last
Arc<TagSet<T>> into build(), the leaked queue can retain blk-mq
state after the tag set is dropped.
Fix the pre-registration failure path by dropping the temporary
gendisk reference with put_disk() before recovering queue_data,
so disk_release() can tear down the owned queue.
Also pair GenDisk::drop() with put_disk() after del_gendisk().
Once a Rust GenDisk has been added with device_add_disk(),
del_gendisk() only unregisters it; the final gendisk reference
still has to be dropped to complete the release path.
Fixes: 3253aba3408a ("rust: block: introduce `kernel::block::mq` module")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Signed-off-by: Haoze Xie <royenheart@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
---
Changes in v2:
- Add the missing put_disk() after del_gendisk() in GenDisk::drop(),
as suggested by Andreas Hindborg.
- Keep the GenDiskBuilder::build() failure cleanup fix and fold both
lifecycle fixes into one patch.
- v1 Link: https://lore.kernel.org/r/b6411cc055080c984a67bfad72fd683aa84b8e13.1779596478.git.royenheart@gmail.com
rust/kernel/block/mq/gen_disk.rs | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/block/mq/gen_disk.rs b/rust/kernel/block/mq/gen_disk.rs
index 912cb805caf5..6ea16b943c99 100644
--- a/rust/kernel/block/mq/gen_disk.rs
+++ b/rust/kernel/block/mq/gen_disk.rs
@@ -149,6 +149,17 @@ pub fn build<T: Operations>(
// SAFETY: `gendisk` is a valid pointer as we initialized it above
unsafe { (*gendisk).fops = &TABLE };
+ let cleanup_failure = ScopeGuard::new_with_data((gendisk, data), |(gendisk, data)| {
+ // SAFETY: `gendisk` came from `__blk_mq_alloc_disk()` above and
+ // has not been added to the VFS on this cleanup path.
+ unsafe { bindings::put_disk(gendisk) };
+ // SAFETY: `data` came from `into_foreign()` above and has not been
+ // converted back on this cleanup path.
+ drop(unsafe { T::QueueData::from_foreign(data) });
+ });
+ // The failure guard now owns both pieces of cleanup; the early guard
+ // must not run on this path anymore.
+ recover_data.dismiss();
let mut writer = NullTerminatedFormatter::new(
// SAFETY: `gendisk` points to a valid and initialized instance. We
@@ -172,7 +183,7 @@ pub fn build<T: Operations>(
},
)?;
- recover_data.dismiss();
+ cleanup_failure.dismiss();
// INVARIANT: `gendisk` was initialized above.
// INVARIANT: `gendisk` was added to the VFS via `device_add_disk` above.
@@ -214,6 +225,10 @@ fn drop(&mut self) {
// initialized instance of `struct gendisk`, and it was previously added
// to the VFS.
unsafe { bindings::del_gendisk(self.gendisk) };
+ // SAFETY: By type invariant, `self.gendisk` was added to the VFS, so
+ // `put_disk()` must follow `del_gendisk()` to drop the final gendisk
+ // reference and trigger the remaining release path.
+ unsafe { bindings::put_disk(self.gendisk) };
// SAFETY: `queue.queuedata` was created by `GenDiskBuilder::build` with
// a call to `ForeignOwnable::into_foreign` to create `queuedata`.
--
2.47.3
^ permalink raw reply related
* Re: [PATCHv3 1/2] block: export passthrough stats enabled
From: Christoph Hellwig @ 2026-05-28 8:29 UTC (permalink / raw)
To: Keith Busch
Cc: linux-block, linux-nvme, axboe, hch, Keith Busch, Nilay Shroff,
Nitesh Shetty
In-Reply-To: <20260528010041.1533124-2-kbusch@meta.com>
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply
* Re: [PATCHv3 2/2] nvme: add support multipath passthrough iostats
From: Christoph Hellwig @ 2026-05-28 8:29 UTC (permalink / raw)
To: Keith Busch
Cc: linux-block, linux-nvme, axboe, hch, Keith Busch, Nilay Shroff,
Nitesh Shetty
In-Reply-To: <20260528010041.1533124-3-kbusch@meta.com>
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Christoph Hellwig @ 2026-05-28 8:38 UTC (permalink / raw)
To: Damien Le Moal
Cc: Tetsuo Handa, Ming Lei, Jens Axboe, Bart Van Assche,
Christoph Hellwig, linux-block, LKML, Andrew Morton,
Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel,
Christian Brauner
In-Reply-To: <ab5492b6-a053-4e3b-8c59-2e836ae85bc4@kernel.org>
On Thu, May 28, 2026 at 03:11:05AM +0900, Damien Le Moal wrote:
> It sounds like the VFS unmount call needs to have something that waits for
> sync() to complete. Though, it really feels very strange that an FS can complete
I don't think this is the VFS-controlled VFS file data writeback, which
we wait on, but some kind of fs controlled metadata. And yes, it looks
like those file systems are buggy in that area. We definitively had
such bugs in XFS before and fixed them.
e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against
unmount")
^ permalink raw reply
* [PATCH v2] block: rename need_dispatch to piecemeal_dispatch in blk-mq sched
From: Guixin Liu @ 2026-05-28 8:44 UTC (permalink / raw)
To: Jens Axboe, Christoph Hellwig; +Cc: linux-block, xlpang, oliver.yang
The local boolean in __blk_mq_sched_dispatch_requests() decides whether
to fall back to the per-ctx round-robin path (blk_mq_do_dispatch_ctx())
instead of the batch flush path (blk_mq_flush_busy_ctxs()). The whole
function is about dispatching anyway, so the name "need_dispatch" is
not particularly informative and can mislead readers into thinking that
a false value means "skip dispatching".
Rename it to "piecemeal_dispatch" to match the comment right above the
check ("dequeue request one by one from sw queue if queue is busy")
and to convey the actual intent: take the piecemeal, fair, one-at-a-time
path either when we just drained hctx->dispatch (so the device has
recently pushed back) or when the dispatch_busy EWMA still indicates
congestion. The fast batch path is only taken when neither signal
suggests recent backpressure.
No functional change.
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
---
v2 -> v1:
- Change cautious_dispatch to piecemeal_dispatch, the advice from Jens.
block/blk-mq-sched.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 0a00f5a76f5a..14dacf99e148 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -267,7 +267,7 @@ static int blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
static int __blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
{
- bool need_dispatch = false;
+ bool piecemeal_dispatch = false;
LIST_HEAD(rq_list);
/*
@@ -298,16 +298,16 @@ static int __blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
blk_mq_sched_mark_restart_hctx(hctx);
if (!blk_mq_dispatch_rq_list(hctx, &rq_list, true))
return 0;
- need_dispatch = true;
+ piecemeal_dispatch = true;
} else {
- need_dispatch = hctx->dispatch_busy;
+ piecemeal_dispatch = hctx->dispatch_busy;
}
if (hctx->queue->elevator)
return blk_mq_do_dispatch_sched(hctx);
/* dequeue request one by one from sw queue if queue is busy */
- if (need_dispatch)
+ if (piecemeal_dispatch)
return blk_mq_do_dispatch_ctx(hctx);
blk_mq_flush_busy_ctxs(hctx, &rq_list);
blk_mq_dispatch_rq_list(hctx, &rq_list, true);
--
2.43.7
^ permalink raw reply related
* [PATCH v2] block: add a bio_endio_status helper
From: Christoph Hellwig @ 2026-05-28 8:46 UTC (permalink / raw)
To: axboe
Cc: linux-block, Keith Busch, Md Haris Iqbal, Damien Le Moal,
Hannes Reinecke
Add a helper that sets bi_status and call bio_endio() as that is a very
common pattern and convert the core block code over to it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Md Haris Iqbal <haris.iqbal@linux.dev>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
---
Changes since v1:
- fix the function name in the kerneldoc comment
block/blk-core.c | 11 ++++-------
block/blk-crypto-fallback.c | 9 +++------
block/blk-crypto.c | 3 +--
block/blk-merge.c | 6 ++----
block/blk-mq.c | 6 ++----
block/fops.c | 3 +--
include/linux/bio.h | 19 +++++++++++++++----
7 files changed, 28 insertions(+), 29 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 22af5dec112b..b0f0a304ea0b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -636,12 +636,10 @@ static void __submit_bio(struct bio *bio)
struct gendisk *disk = bio->bi_bdev->bd_disk;
if ((bio->bi_opf & REQ_POLLED) &&
- !(disk->queue->limits.features & BLK_FEAT_POLL)) {
- bio->bi_status = BLK_STS_NOTSUPP;
- bio_endio(bio);
- } else {
+ !(disk->queue->limits.features & BLK_FEAT_POLL))
+ bio_endio_status(bio, BLK_STS_NOTSUPP);
+ else
disk->fops->submit_bio(bio);
- }
blk_queue_exit(disk->queue);
}
@@ -886,8 +884,7 @@ void submit_bio_noacct(struct bio *bio)
not_supported:
status = BLK_STS_NOTSUPP;
end_io:
- bio->bi_status = status;
- bio_endio(bio);
+ bio_endio_status(bio, status);
}
EXPORT_SYMBOL(submit_bio_noacct);
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index 61f595410832..8b04d9205b8d 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -361,8 +361,7 @@ static void blk_crypto_fallback_encrypt_bio(struct bio *src_bio)
status = blk_crypto_get_keyslot(blk_crypto_fallback_profile,
bc->bc_key, &slot);
if (status != BLK_STS_OK) {
- src_bio->bi_status = status;
- bio_endio(src_bio);
+ bio_endio_status(src_bio, status);
return;
}
__blk_crypto_fallback_encrypt_bio(src_bio,
@@ -437,8 +436,7 @@ static void blk_crypto_fallback_decrypt_bio(struct work_struct *work)
}
mempool_free(f_ctx, bio_fallback_crypt_ctx_pool);
- bio->bi_status = status;
- bio_endio(bio);
+ bio_endio_status(bio, status);
}
/**
@@ -499,8 +497,7 @@ bool blk_crypto_fallback_bio_prep(struct bio *bio)
if (!__blk_crypto_cfg_supported(blk_crypto_fallback_profile,
&bc->bc_key->crypto_cfg)) {
- bio->bi_status = BLK_STS_NOTSUPP;
- bio_endio(bio);
+ bio_endio_status(bio, BLK_STS_NOTSUPP);
return false;
}
diff --git a/block/blk-crypto.c b/block/blk-crypto.c
index 856d3c5b1fa0..165c9d2cce07 100644
--- a/block/blk-crypto.c
+++ b/block/blk-crypto.c
@@ -267,8 +267,7 @@ bool __blk_crypto_submit_bio(struct bio *bio)
if (!IS_ENABLED(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK)) {
pr_warn_once("%pg: crypto API fallback disabled; failing request.\n",
bdev);
- bio->bi_status = BLK_STS_NOTSUPP;
- bio_endio(bio);
+ bio_endio_status(bio, BLK_STS_NOTSUPP);
return false;
}
return blk_crypto_fallback_bio_prep(bio);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index fcf09325b22e..7cc82a7a6f4e 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -122,8 +122,7 @@ struct bio *bio_submit_split_bioset(struct bio *bio, unsigned int split_sectors,
struct bio *split = bio_split(bio, split_sectors, GFP_NOIO, bs);
if (IS_ERR(split)) {
- bio->bi_status = errno_to_blk_status(PTR_ERR(split));
- bio_endio(bio);
+ bio_endio_status(bio, errno_to_blk_status(PTR_ERR(split)));
return NULL;
}
@@ -143,8 +142,7 @@ EXPORT_SYMBOL_GPL(bio_submit_split_bioset);
static struct bio *bio_submit_split(struct bio *bio, int split_sectors)
{
if (unlikely(split_sectors < 0)) {
- bio->bi_status = errno_to_blk_status(split_sectors);
- bio_endio(bio);
+ bio_endio_status(bio, errno_to_blk_status(split_sectors));
return NULL;
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4c5c16cce4f8..ade9d3a89743 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3187,8 +3187,7 @@ void blk_mq_submit_bio(struct bio *bio)
}
if ((bio->bi_opf & REQ_POLLED) && !blk_mq_can_poll(q)) {
- bio->bi_status = BLK_STS_NOTSUPP;
- bio_endio(bio);
+ bio_endio_status(bio, BLK_STS_NOTSUPP);
goto queue_exit;
}
@@ -3229,8 +3228,7 @@ void blk_mq_submit_bio(struct bio *bio)
ret = blk_crypto_rq_get_keyslot(rq);
if (ret != BLK_STS_OK) {
- bio->bi_status = ret;
- bio_endio(bio);
+ bio_endio_status(bio, ret);
blk_mq_free_request(rq);
return;
}
diff --git a/block/fops.c b/block/fops.c
index ffe7b2042f4e..15783a6180de 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -218,8 +218,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
ret = blkdev_iov_iter_get_pages(bio, iter, bdev);
if (unlikely(ret)) {
- bio->bi_status = BLK_STS_IOERR;
- bio_endio(bio);
+ bio_endio_status(bio, BLK_STS_IOERR);
break;
}
if (iocb->ki_flags & IOCB_NOWAIT) {
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7597ae4dc52b..e86c0d2613e2 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -371,16 +371,27 @@ void submit_bio(struct bio *bio);
extern void bio_endio(struct bio *);
-static inline void bio_io_error(struct bio *bio)
+/**
+ * bio_endio_status - end I/O on a bio with a specific status
+ * @bio: bio
+ * @status: status to set
+ *
+ * Set @bio->bi_status to @status and call bio_endio().
+ **/
+static inline void bio_endio_status(struct bio *bio, blk_status_t status)
{
- bio->bi_status = BLK_STS_IOERR;
+ bio->bi_status = status;
bio_endio(bio);
}
+static inline void bio_io_error(struct bio *bio)
+{
+ bio_endio_status(bio, BLK_STS_IOERR);
+}
+
static inline void bio_wouldblock_error(struct bio *bio)
{
- bio->bi_status = BLK_STS_AGAIN;
- bio_endio(bio);
+ bio_endio_status(bio, BLK_STS_AGAIN);
}
/*
--
2.53.0
^ permalink raw reply related
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Qu Wenruo @ 2026-05-28 10:16 UTC (permalink / raw)
To: Christoph Hellwig, Damien Le Moal
Cc: Tetsuo Handa, Ming Lei, Jens Axboe, Bart Van Assche, linux-block,
LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba,
linux-fsdevel, Christian Brauner
In-Reply-To: <20260528083848.GA7694@lst.de>
在 2026/5/28 18:08, Christoph Hellwig 写道:
> On Thu, May 28, 2026 at 03:11:05AM +0900, Damien Le Moal wrote:
>> It sounds like the VFS unmount call needs to have something that waits for
>> sync() to complete. Though, it really feels very strange that an FS can complete
>
> I don't think this is the VFS-controlled VFS file data writeback, which
> we wait on, but some kind of fs controlled metadata. And yes, it looks
> like those file systems are buggy in that area. We definitively had
> such bugs in XFS before and fixed them.
>
> e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against
> unmount")
Considering the xfs fix is pretty old, it's before the fix hint thus no
such mention in fstests.
Do you happen to know which test case is for that fix?
I'd like to adapt it for btrfs as a reproducer.
This syzbot report doesn't provide a reproducer.
Another thing is, if it's some btrfs bios on-the-fly after
close_ctree(), the most common symptom should be NULL pointer
dereference inside various btrfs endio functions.
As all those end_bbio_*() functions are referring to either fs_info or
inode/eb, thus if the fs is unmounted before the bio finished, they
should all cause use-after-free.
The only exception is discard, which is using blkdev_issue_discard()
thus has no such reference to btrfs internal structure, but that's out
of my understanding.
Thanks,
Qu
^ permalink raw reply
* Re: [linux-next:master] [block] 73cf422c6a: EIP:look_up_lock_class
From: Tetsuo Handa @ 2026-05-28 11:38 UTC (permalink / raw)
To: kernel test robot; +Cc: oe-lkp, lkp, linux-block
In-Reply-To: <202605281444.29e7f79b-lkp@intel.com>
On 2026/05/28 16:05, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed "EIP:look_up_lock_class" on:
>
> commit: 73cf422c6afb067b7e5d395dae8c44e076a00ebb ("block: assign caller-specific lockdep class to disk->open_mutex")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> [test failed on linux-next/master e7d700e14934e68f86338c5610cf2ae76798b663]
Thank you. Fixed in https://sourceforge.net/p/tomoyo/tomoyo.git/ci/c2245c765ebeba9dcb924d9171d8d470a9ac41c8/ .
^ permalink raw reply
* Re: [PATCH v2 1/1] rust: block: fix GenDisk cleanup paths
From: Andreas Hindborg @ 2026-05-28 11:50 UTC (permalink / raw)
To: Ren Wei, linux-block, rust-for-linux
Cc: ojeda, boqun, gary, bjorn3_gh, lossin, aliceryhl, tmgross, dakr,
daniel.almeida, axboe, tamird, sunke, yuantan098, bird,
royenheart, n05ec
In-Reply-To: <e14c015e2e0bde04f84a9452330b94436e2d8e68.1779901336.git.royenheart@gmail.com>
Ren Wei <n05ec@lzu.edu.cn> writes:
> From: Haoze Xie <royenheart@gmail.com>
>
> GenDiskBuilder::build() still has fallible work after
> __blk_mq_alloc_disk(), but its error path only recovers the
> foreign queue data. That leaks the temporary gendisk and
> request_queue until later teardown. If the caller moved the last
> Arc<TagSet<T>> into build(), the leaked queue can retain blk-mq
> state after the tag set is dropped.
>
> Fix the pre-registration failure path by dropping the temporary
> gendisk reference with put_disk() before recovering queue_data,
> so disk_release() can tear down the owned queue.
>
> Also pair GenDisk::drop() with put_disk() after del_gendisk().
> Once a Rust GenDisk has been added with device_add_disk(),
> del_gendisk() only unregisters it; the final gendisk reference
> still has to be dropped to complete the release path.
>
> Fixes: 3253aba3408a ("rust: block: introduce `kernel::block::mq` module")
> Cc: stable@kernel.org
> Reported-by: Yuan Tan <yuantan098@gmail.com>
> Reported-by: Xin Liu <bird@lzu.edu.cn>
> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
> Signed-off-by: Haoze Xie <royenheart@gmail.com>
> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Looks good to me, but could you please add some newlines for
readability:
diff --git a/rust/kernel/block/mq/gen_disk.rs b/rust/kernel/block/mq/gen_disk.rs
index 6ea16b943c99..fc97dd873974 100644
--- a/rust/kernel/block/mq/gen_disk.rs
+++ b/rust/kernel/block/mq/gen_disk.rs
@@ -149,6 +149,7 @@ pub fn build<T: Operations>(
// SAFETY: `gendisk` is a valid pointer as we initialized it above
unsafe { (*gendisk).fops = &TABLE };
+
let cleanup_failure = ScopeGuard::new_with_data((gendisk, data), |(gendisk, data)| {
// SAFETY: `gendisk` came from `__blk_mq_alloc_disk()` above and
// has not been added to the VFS on this cleanup path.
@@ -157,6 +158,7 @@ pub fn build<T: Operations>(
// converted back on this cleanup path.
drop(unsafe { T::QueueData::from_foreign(data) });
});
+
// The failure guard now owns both pieces of cleanup; the early guard
// must not run on this path anymore.
recover_data.dismiss();
@@ -225,6 +227,7 @@ fn drop(&mut self) {
// initialized instance of `struct gendisk`, and it was previously added
// to the VFS.
unsafe { bindings::del_gendisk(self.gendisk) };
+
// SAFETY: By type invariant, `self.gendisk` was added to the VFS, so
// `put_disk()` must follow `del_gendisk()` to drop the final gendisk
// reference and trigger the remaining release path.
With those changes:
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Best regards,
Andreas Hindborg
^ permalink raw reply related
* 回复: [PATCH v2] scsi: bsg: read io_uring command fields once
From: 杨秀伟 @ 2026-05-28 12:39 UTC (permalink / raw)
To: rc, James E . J . Bottomley, Martin K . Petersen, Jens Axboe,
FUJITA Tomonori
Cc: linux-scsi, linux-block, io-uring, linux-kernel, Bart Van Assche,
Caleb Sander Mateos, stable
[-- Attachment #1: Type: text/html, Size: 8523 bytes --]
^ permalink raw reply
* Re: [PATCH] block: mark biovec_init_pool static
From: Jens Axboe @ 2026-05-28 14:01 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-block
In-Reply-To: <20260527150646.2349405-1-hch@lst.de>
On Wed, 27 May 2026 17:06:46 +0200, Christoph Hellwig wrote:
>
Applied, thanks!
[1/1] block: mark biovec_init_pool static
commit: 353c85082a82fa6d78cbb3821749d5982ffed9f4
Best regards,
--
Jens Axboe
^ permalink raw reply
* Re: clean up bvec iter helpers
From: Jens Axboe @ 2026-05-28 14:01 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Sagi Grimberg, Ming Lei, Bart Van Assche,
Caleb Sander Mateos, linux-block, linux-nvme
In-Reply-To: <20260527151043.2349900-1-hch@lst.de>
On Wed, 27 May 2026 17:10:19 +0200, Christoph Hellwig wrote:
> this series converts the bvec_iter helpers from macros to inline
> functions, and to facilitate that cleans up a little bit of code
> in the loop and nvme-tcp drivers first.
>
> Diffstat:
> drivers/block/loop.c | 24 ++++-------
> drivers/nvme/host/tcp.c | 27 ++++--------
> include/linux/bvec.h | 101 ++++++++++++++++++++++++++++++------------------
> 3 files changed, 84 insertions(+), 68 deletions(-)
>
> [...]
Applied, thanks!
[1/3] loop: cleanup lo_rw_aio
commit: 7dea9029721675d475e093116cef569253960e06
[2/3] nvme-tcp: cleanup nvme_tcp_init_iter
commit: adf3a5cef1a839e388dc382b3e07623f52746322
[3/3] bvec: make the bvec_iter helpers inline functions
commit: f6fe52a7b18675d76d7f7dae0c16f412a4e33f9a
Best regards,
--
Jens Axboe
^ permalink raw reply
* Re: [PATCH v2] block: add a bio_endio_status helper
From: Jens Axboe @ 2026-05-28 14:01 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-block, Keith Busch, Md Haris Iqbal, Damien Le Moal,
Hannes Reinecke
In-Reply-To: <20260528084632.2505277-1-hch@lst.de>
On Thu, 28 May 2026 10:46:13 +0200, Christoph Hellwig wrote:
> Add a helper that sets bi_status and call bio_endio() as that is a very
> common pattern and convert the core block code over to it.
Applied, thanks!
[1/1] block: add a bio_endio_status helper
commit: a7d8eaee7fafe2e2c58aef9579bdef778c144029
Best regards,
--
Jens Axboe
^ permalink raw reply
* Re: [PATCHv3 0/2] block, nvme: enable passthrough iostats
From: Jens Axboe @ 2026-05-28 15:35 UTC (permalink / raw)
To: linux-block, linux-nvme, Keith Busch; +Cc: hch, Keith Busch
In-Reply-To: <20260528010041.1533124-1-kbusch@meta.com>
On Wed, 27 May 2026 18:00:39 -0700, Keith Busch wrote:
> v2->v3:
>
> Added kerneldoc for the exported API
>
> Added code comment for the passthrough safety
>
> Added reviews.
>
> [...]
Applied, thanks!
[1/2] block: export passthrough stats enabled
commit: b7f40ab50190e2500c3c297d15e00040dca47feb
[2/2] nvme: add support multipath passthrough iostats
commit: 7d6eb455ecf0f95c54257ae372ac1272cff834e3
Best regards,
--
Jens Axboe
^ permalink raw reply
* [PATCH v2] nvme-multipath: set BIO_REMAPPED on bios remapped to per-path namespace disks
From: Achkinazi, Igor @ 2026-05-28 15:24 UTC (permalink / raw)
To: kbusch@kernel.org, hch@lst.de, sagi@grimberg.me, axboe@kernel.dk
Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
In-Reply-To: <MW5PR19MB548483D1FAE4F322E4C97352FD032@MW5PR19MB5484.namprd19.prod.outlook.com>
When nvme_ns_head_submit_bio() remaps a bio from the multipath head to
a per-path namespace, bio_set_dev() clears BIO_REMAPPED. The remapped
bio is then resubmitted through submit_bio_noacct() which calls
bio_check_eod() because BIO_REMAPPED is not set.
This races with nvme_ns_remove() which zeroes the per-path capacity
before synchronize_srcu():
CPU 0 (IO submission)
---------------------
srcu_read_lock()
nvme_find_path() -> ns
[NVME_NS_READY is set]
CPU 1 (namespace removal)
-------------------------
clear_bit(NVME_NS_READY)
set_capacity(ns->disk, 0)
synchronize_srcu() <- blocks
CPU 0 (IO submission)
---------------------
bio_set_dev(bio, ns->disk->part0)
[clears BIO_REMAPPED]
submit_bio_noacct(bio)
-> bio_check_eod() sees capacity=0
-> bio fails with IO error
The SRCU read lock prevents synchronize_srcu() from completing, but
does not prevent set_capacity(0) from executing. The bio fails the
EOD check before it reaches the NVMe driver, so nvme_failover_req()
never gets a chance to redirect it to another path of multipath. IO errors
are reported to the application despite another path being available.
On older kernels (before commit 0b64682e78f7 "block: skip unnecessary
checks for split bio"), the same race was also reachable through split
remainders resubmitted via submit_bio_noacct().
Observed during NVMe multipath failover testing at Dell on
5.14.0-570.23.1.el9_6.x86_64 (RHEL 9.7) and
6.4.0-150600.23.53-default (SLES 15.6).
Fix this by setting BIO_REMAPPED after bio_set_dev() in
nvme_ns_head_submit_bio(). This skips bio_check_eod() on the per-path
device; the EOD check already passed on the multipath head.
NVMe per-path namespace devices are always whole disks (bd_partno=0),
so the blk_partition_remap() skip also gated by BIO_REMAPPED is a
no-op. The flag does not persist across failover and cannot go stale
if the namespace geometry changes between attempts: nvme_failover_req()
calls bio_set_dev() to redirect the bio back to the multipath head,
which clears BIO_REMAPPED. When nvme_requeue_work() resubmits through
submit_bio_noacct(), bio_check_eod() runs normally against the current
capacity.
Same approach as commit 3a905c37c351 ("block: skip bio_check_eod for
partition-remapped bios").
A broader solution that moves bio validation into the queue-entered
context and eliminates the set_capacity(0) hack is being developed
upstream, however this minimal fix is suitable for backporting to
stable kernels affected today. The link to the mentioned patch:
https://lore.kernel.org/linux-block/20260519172326.3462354-1-kbusch@meta.com/
Fixes: a7c7f7b2b641 ("nvme: use bio_set_dev to assign ->bi_bdev")
Cc: stable@vger.kernel.org
Signed-off-by: Igor Achkinazi <igor.achkinazi@dell.com>
---
v2:
- Corrected race description: primary race is in the initial
submit_bio_noacct() call in nvme_ns_head_submit_bio(), not
only in split remainders (which are no longer affected on
current mainline since commit 0b64682e78f7)
- Dropped incorrect arguments about submit_bio_noacct_nocheck
export status and BIO_REMAPPED propagation to split clones
- Added analysis showing BIO_REMAPPED flag does not persist
across failover (nvme_failover_req clears it via bio_set_dev)
- Referenced upstream RFC series addressing the root cause
drivers/nvme/host/multipath.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 263161cb8ac0..04f7c7e59945 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -511,6 +511,13 @@ static void nvme_ns_head_submit_bio(struct bio *bio)
ns = nvme_find_path(head);
if (likely(ns)) {
bio_set_dev(bio, ns->disk->part0);
+ /*
+ * Skip bio_check_eod() when this bio enters
+ * submit_bio_noacct() for the per-path device.
+ * The EOD check already passed on the multipath head.
+ */
+ bio_set_flag(bio, BIO_REMAPPED);
bio->bi_opf |= REQ_NVME_MPATH;
trace_block_bio_remap(bio, disk_devt(ns->head->disk),
bio->bi_iter.bi_sector);
--
2.43.0
Internal Use - Confidential
^ permalink raw reply related
* [PATCH v2 0/2] Add bvec_folio and its kernel-doc
From: Matthew Wilcox (Oracle) @ 2026-05-28 17:59 UTC (permalink / raw)
To: Jens Axboe
Cc: Matthew Wilcox (Oracle), linux-block, linux-kernel, io-uring,
linux-mm, Leon Romanovsky, Christoph Hellwig
Add the convenience helper bvec_folio() to avoid references to bv_page.
Convert a few of the obvious users.
v2:
- Tweak the kernel-doc (Christoph)
- Add the bvec kerneldoc to the documentation build
Matthew Wilcox (Oracle) (2):
block: Add bvec_folio()
block: Include bvec.h kernel-doc in the htmldocs
Documentation/core-api/kernel-api.rst | 1 +
block/bio.c | 6 +++---
include/linux/bio.h | 2 +-
include/linux/bvec.h | 17 +++++++++++++++++
io_uring/rsrc.c | 2 +-
mm/page_io.c | 4 ++--
6 files changed, 25 insertions(+), 7 deletions(-)
--
2.47.3
^ permalink raw reply
* [PATCH v2 2/2] block: Include bvec.h kernel-doc in the htmldocs
From: Matthew Wilcox (Oracle) @ 2026-05-28 17:59 UTC (permalink / raw)
To: Jens Axboe
Cc: Matthew Wilcox (Oracle), linux-block, linux-kernel, io-uring,
linux-mm, Leon Romanovsky, Christoph Hellwig
In-Reply-To: <20260528175905.1102280-1-willy@infradead.org>
People have gone to the trouble of writing this kernel-doc; the
least we can do is publish it.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
Documentation/core-api/kernel-api.rst | 1 +
include/linux/bvec.h | 2 ++
2 files changed, 3 insertions(+)
diff --git a/Documentation/core-api/kernel-api.rst b/Documentation/core-api/kernel-api.rst
index e8211c4ca662..4c4a57c1c094 100644
--- a/Documentation/core-api/kernel-api.rst
+++ b/Documentation/core-api/kernel-api.rst
@@ -307,6 +307,7 @@ Accounting Framework
Block Devices
=============
+.. kernel-doc:: include/linux/bvec.h
.. kernel-doc:: include/linux/bio.h
.. kernel-doc:: block/blk-core.c
:export:
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 27ac3fcc6d9e..09d6bb76919e 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -262,6 +262,7 @@ static inline void *bvec_kmap_local(struct bio_vec *bvec)
/**
* memcpy_from_bvec - copy data from a bvec
+ * @to: Kernel virtual address to copy to.
* @bvec: bvec to copy from
*
* Must be called on single-page bvecs only.
@@ -274,6 +275,7 @@ static inline void memcpy_from_bvec(char *to, struct bio_vec *bvec)
/**
* memcpy_to_bvec - copy data to a bvec
* @bvec: bvec to copy to
+ * @from: Kernel virtual address to copy from.
*
* Must be called on single-page bvecs only.
*/
--
2.47.3
^ permalink raw reply related
* [PATCH v2 1/2] block: Add bvec_folio()
From: Matthew Wilcox (Oracle) @ 2026-05-28 17:59 UTC (permalink / raw)
To: Jens Axboe
Cc: Matthew Wilcox (Oracle), linux-block, linux-kernel, io-uring,
linux-mm, Leon Romanovsky, Christoph Hellwig
In-Reply-To: <20260528175905.1102280-1-willy@infradead.org>
This is a simple helper which replaces page_folio(bvec->bv_page).
Minor improvement in readability, but the real motivation is to reduce
the number of references to bvec->bv_page so that it can be changed
with less work.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Leon Romanovsky <leon@kernel.org>
---
block/bio.c | 6 +++---
include/linux/bio.h | 2 +-
include/linux/bvec.h | 15 +++++++++++++++
io_uring/rsrc.c | 2 +-
mm/page_io.c | 4 ++--
5 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 5f10900b3f42..85aab3140909 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1300,7 +1300,7 @@ static void bio_free_folios(struct bio *bio)
int i;
bio_for_each_bvec_all(bv, bio, i) {
- struct folio *folio = page_folio(bv->bv_page);
+ struct folio *folio = bvec_folio(bv);
if (!is_zero_folio(folio))
folio_put(folio);
@@ -1409,7 +1409,7 @@ int bio_iov_iter_bounce(struct bio *bio, struct iov_iter *iter, size_t maxlen,
static void bvec_unpin(struct bio_vec *bv, bool mark_dirty)
{
- struct folio *folio = page_folio(bv->bv_page);
+ struct folio *folio = bvec_folio(bv);
size_t nr_pages = (bv->bv_offset + bv->bv_len - 1) / PAGE_SIZE -
bv->bv_offset / PAGE_SIZE + 1;
@@ -1443,7 +1443,7 @@ static void bio_iov_iter_unbounce_read(struct bio *bio, bool is_error,
bvec_unpin(&bio->bi_io_vec[1 + i], mark_dirty);
}
- folio_put(page_folio(bio->bi_io_vec[0].bv_page));
+ folio_put(bvec_folio(&bio->bi_io_vec[0]));
}
/**
diff --git a/include/linux/bio.h b/include/linux/bio.h
index dc17780d6c1e..6613ab4519bd 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -283,7 +283,7 @@ static inline void bio_first_folio(struct folio_iter *fi, struct bio *bio,
return;
}
- fi->folio = page_folio(bvec->bv_page);
+ fi->folio = bvec_folio(bvec);
fi->offset = bvec->bv_offset +
PAGE_SIZE * folio_page_idx(fi->folio, bvec->bv_page);
fi->_seg_count = bvec->bv_len;
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index d36dd476feda..27ac3fcc6d9e 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -74,6 +74,21 @@ static inline void bvec_set_virt(struct bio_vec *bv, void *vaddr,
bvec_set_page(bv, virt_to_page(vaddr), len, offset_in_page(vaddr));
}
+/**
+ * bvec_folio - Return the first folio referenced by this bvec
+ * @bv: bvec to access
+ *
+ * A bvec can contain non-folio memory, so this should only be called by
+ * the creator of the bvec; drivers have no business looking at the owner
+ * of the memory. It may not even be the right interface for the caller
+ * to use as a bvec can span multiple folios. You may be better off using
+ * something like bio_for_each_folio_all() which iterates over all folios.
+ */
+static inline struct folio *bvec_folio(const struct bio_vec *bv)
+{
+ return page_folio(bv->bv_page);
+}
+
struct bvec_iter {
/*
* Current device address in 512 byte sectors. Only updated by the bio
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 650303626be6..5d792f70ec1e 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -102,7 +102,7 @@ static void io_release_ubuf(void *priv)
unsigned int i;
for (i = 0; i < imu->nr_bvecs; i++) {
- struct folio *folio = page_folio(imu->bvec[i].bv_page);
+ struct folio *folio = bvec_folio(&imu->bvec[i]);
unpin_user_folio(folio, 1);
}
diff --git a/mm/page_io.c b/mm/page_io.c
index 70cea9e24d2f..a59b73f8bdd9 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -490,7 +490,7 @@ static void sio_read_complete(struct kiocb *iocb, long ret)
if (ret == sio->len) {
for (p = 0; p < sio->pages; p++) {
- struct folio *folio = page_folio(sio->bvec[p].bv_page);
+ struct folio *folio = bvec_folio(&sio->bvec[p]);
count_mthp_stat(folio_order(folio), MTHP_STAT_SWPIN);
count_memcg_folio_events(folio, PSWPIN, folio_nr_pages(folio));
@@ -500,7 +500,7 @@ static void sio_read_complete(struct kiocb *iocb, long ret)
count_vm_events(PSWPIN, sio->len >> PAGE_SHIFT);
} else {
for (p = 0; p < sio->pages; p++) {
- struct folio *folio = page_folio(sio->bvec[p].bv_page);
+ struct folio *folio = bvec_folio(&sio->bvec[p]);
folio_unlock(folio);
}
--
2.47.3
^ permalink raw reply related
* Re: [PATCH v2] nvme-multipath: set BIO_REMAPPED on bios remapped to per-path namespace disks
From: Keith Busch @ 2026-05-28 18:19 UTC (permalink / raw)
To: Achkinazi, Igor
Cc: hch@lst.de, sagi@grimberg.me, axboe@kernel.dk,
linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
In-Reply-To: <DS0PR19MB76963295FC34844B413479F9FD092@DS0PR19MB7696.namprd19.prod.outlook.com>
On Thu, May 28, 2026 at 03:24:27PM +0000, Achkinazi, Igor wrote:
> The SRCU read lock prevents synchronize_srcu() from completing, but
> does not prevent set_capacity(0) from executing. The bio fails the
> EOD check before it reaches the NVMe driver, so nvme_failover_req()
> never gets a chance to redirect it to another path of multipath. IO errors
> are reported to the application despite another path being available.
I double checked the sequences here, and yes, I think the
synchronize_srcu's already in place ensure every caller sees the EOD
error before it could fail the bio_queue_enter(), so this looks like it
happens to be sufficient. I'm okay with it.
^ permalink raw reply
* [PATCH v5 01/12] block: Annotate the queue limits functions
From: Bart Van Assche @ 2026-05-28 19:45 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Christoph Hellwig, Damien Le Moal, Bart Van Assche
In-Reply-To: <cover.1779997063.git.bvanassche@acm.org>
Let the thread-safety checker verify whether every start of a queue
limits update is followed by a call to a function that finishes a queue
limits update.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
include/linux/blkdev.h | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 17270a28c66d..65efbd7fe1a3 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1092,15 +1092,17 @@ static inline unsigned int blk_boundary_sectors_left(sector_t offset,
*/
static inline struct queue_limits
queue_limits_start_update(struct request_queue *q)
+ __acquires(&q->limits_lock)
{
mutex_lock(&q->limits_lock);
return q->limits;
}
int queue_limits_commit_update_frozen(struct request_queue *q,
- struct queue_limits *lim);
+ struct queue_limits *lim) __releases(&q->limits_lock);
int queue_limits_commit_update(struct request_queue *q,
- struct queue_limits *lim);
-int queue_limits_set(struct request_queue *q, struct queue_limits *lim);
+ struct queue_limits *lim) __releases(&q->limits_lock);
+int queue_limits_set(struct request_queue *q, struct queue_limits *lim)
+ __must_not_hold(&q->limits_lock);
int blk_validate_limits(struct queue_limits *lim);
/**
@@ -1112,6 +1114,7 @@ int blk_validate_limits(struct queue_limits *lim);
* starting update.
*/
static inline void queue_limits_cancel_update(struct request_queue *q)
+ __releases(&q->limits_lock)
{
mutex_unlock(&q->limits_lock);
}
^ permalink raw reply related
* [PATCH v5 00/12] Enable lock context analysis
From: Bart Van Assche @ 2026-05-28 19:45 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Christoph Hellwig, Damien Le Moal, Bart Van Assche
Hi Jens,
Recently the following patch series has been merged: [PATCH v5 00/36]
Compiler-Based Context- and Locking-Analysis
(https://lore.kernel.org/lkml/20251219154418.3592607-1-elver@google.com/). That
patch series drops support for verifying lock context annotations with sparse
and introduces support for verifying lock context annotations with Clang. The
support in Clang for lock context annotation and verification is better than
that in sparse. As an example, __cond_acquires() and __guarded_by() are
supported by Clang but not by sparse. Hence this patch series that enables lock
context analysis for the block layer core.
Note: although the Linux kernel documentation specifies 22 as minimal
version for Clang for context analysis support, this patch series requires
Clang 23 because it annotates function pointers. As one can see here, a patch
has been queued that fixes the kernel documentation:
https://lore.kernel.org/all/177926568868.711.3058599932884307249.tip-bot2@tip-bot2/
Please consider this patch series for the next merge window.
Thanks,
Bart.
Changes compared to v4:
- Rebased and retested on top of Jens' latest for-next branch.
Changes compared to v3:
- Replaced the "block/bdev: Annotate the blk_holder_ops callback invocations"
patch with a patch that adds __releases() annotations to the function
pointers in struct blk_holder_ops.
- Dropped the blk-zoned patch since a better patch from Christoph has
been merged.
Changes compared to v2:
- Retained the block layer core patches and left out the block driver patches.
- Inlined blkg_conf_open_bdev_frozen() and blkg_conf_close_bdev_frozen().
- In blkg_conf_open_bdev(), added a return statement if the
WARN_ON_ONCE() statement triggers.
- Replaced the "block/ioctl: Add lock context annotations" patch with a
__release() annotation.
- Replaced the blk-zoned patch with a patch from Christoph.
Changes compared to v1:
- Rebased this patch series on top of Jens' for-next branch.
- Included two patches that split blkg_conf_prep() and blkg_conf_exit().
- Modified how patches are split. Split the block layer core patch into
multiple patches and moved the CONTEXT_ANALYSIS := y assignments into the
block driver patches.
- Made the new source code comments easier to comprehend.
- Introduced macros in the mq-deadline and Kyber I/O schedulers to make the
__acquires() expressions easier to read.
- Removed the changes from this series that are not block layer changes.
Bart Van Assche (12):
block: Annotate the queue limits functions
block/bdev: Annotate the blk_holder_ops callback functions
block/cgroup: Split blkg_conf_prep()
block/cgroup: Split blkg_conf_exit()
block/cgroup: Improve lock context annotations
block/cgroup: Inline blkg_conf_{open,close}_bdev_frozen()
block/crypto: Annotate the crypto functions
block/blk-iocost: Add lock context annotations
block/blk-mq-debugfs: Improve lock context annotations
block/kyber: Make the lock context annotations compatible with Clang
block/mq-deadline: Make the lock context annotations compatible with
Clang
block: Enable lock context analysis
block/Makefile | 2 +
block/bfq-cgroup.c | 11 ++++-
block/blk-cgroup.c | 98 ++++++++++----------------------------
block/blk-cgroup.h | 13 +++--
block/blk-crypto-profile.c | 2 +
block/blk-iocost.c | 96 +++++++++++++++++++++++--------------
block/blk-iolatency.c | 19 ++++----
block/blk-mq-debugfs.c | 12 ++---
block/blk-throttle.c | 34 +++++++------
block/blk.h | 4 ++
block/kyber-iosched.c | 7 ++-
block/mq-deadline.c | 12 +++--
include/linux/blkdev.h | 21 +++++---
13 files changed, 173 insertions(+), 158 deletions(-)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox