Linux block layer
 help / color / mirror / Atom feed
* [syzbot] [block?] general protection fault in lo_rw_aio
@ 2026-04-18  0:02 syzbot
  2026-04-21 11:05 ` Tetsuo Handa
  0 siblings, 1 reply; 29+ messages in thread
From: syzbot @ 2026-04-18  0:02 UTC (permalink / raw)
  To: axboe, linux-block, linux-kernel, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    43cfbdda5af6 Merge tag 'for-linus-iommufd' of git://git.ke..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=101e4702580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4660d1ff2985517b
dashboard link: https://syzkaller.appspot.com/bug?extid=cd8a9a308e879a4e2c28
compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/0867fa0b89e8/disk-43cfbdda.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/754859270006/vmlinux-43cfbdda.xz
kernel image: https://storage.googleapis.com/syzbot-assets/cd6cca8d06c9/bzImage-43cfbdda.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+cd8a9a308e879a4e2c28@syzkaller.appspotmail.com

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000014: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x00000000000000a0-0x00000000000000a7]
CPU: 1 UID: 0 PID: 1174 Comm: kworker/u8:8 Not tainted syzkaller #0 PREEMPT_{RT,(full)} 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Workqueue: loop2 loop_workfn
RIP: 0010:file_inode include/linux/fs.h:1353 [inline]
RIP: 0010:kiocb_start_write include/linux/fs.h:2763 [inline]
RIP: 0010:lo_rw_aio+0xaa9/0xf00 drivers/block/loop.c:401
Code: 89 33 31 ff 8b 5c 24 44 89 de e8 32 2b 3f fc 85 db 0f 84 ca 00 00 00 48 8b 44 24 58 48 8d 98 a0 00 00 00 48 89 d8 48 c1 e8 03 <42> 80 3c 28 00 74 08 48 89 df e8 e8 ae a5 fc 4c 89 7c 24 10 48 8b
RSP: 0018:ffffc9000655f620 EFLAGS: 00010206
RAX: 0000000000000014 RBX: 00000000000000a0 RCX: ffff888029649ec0
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffffc9000655f790 R08: 0000000000000000 R09: 0000000000000000
R10: ffffc9000655f6e3 R11: fffff52000cabede R12: ffff888026c9b090
R13: dffffc0000000000 R14: 0000000000000000 R15: ffff888026c9b0b0
FS:  0000000000000000(0000) GS:ffff88812620f000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555591947a68 CR3: 000000005515a000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 do_req_filebacked drivers/block/loop.c:433 [inline]
 loop_handle_cmd drivers/block/loop.c:1925 [inline]
 loop_process_work+0x637/0x11b0 drivers/block/loop.c:1960
 process_one_work kernel/workqueue.c:3302 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3385
 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3466
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:file_inode include/linux/fs.h:1353 [inline]
RIP: 0010:kiocb_start_write include/linux/fs.h:2763 [inline]
RIP: 0010:lo_rw_aio+0xaa9/0xf00 drivers/block/loop.c:401
Code: 89 33 31 ff 8b 5c 24 44 89 de e8 32 2b 3f fc 85 db 0f 84 ca 00 00 00 48 8b 44 24 58 48 8d 98 a0 00 00 00 48 89 d8 48 c1 e8 03 <42> 80 3c 28 00 74 08 48 89 df e8 e8 ae a5 fc 4c 89 7c 24 10 48 8b
RSP: 0018:ffffc9000655f620 EFLAGS: 00010206
RAX: 0000000000000014 RBX: 00000000000000a0 RCX: ffff888029649ec0
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffffc9000655f790 R08: 0000000000000000 R09: 0000000000000000
R10: ffffc9000655f6e3 R11: fffff52000cabede R12: ffff888026c9b090
R13: dffffc0000000000 R14: 0000000000000000 R15: ffff888026c9b0b0
FS:  0000000000000000(0000) GS:ffff88812620f000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555591947a68 CR3: 000000005515a000 CR4: 00000000003526f0
----------------
Code disassembly (best guess):
   0:	89 33                	mov    %esi,(%rbx)
   2:	31 ff                	xor    %edi,%edi
   4:	8b 5c 24 44          	mov    0x44(%rsp),%ebx
   8:	89 de                	mov    %ebx,%esi
   a:	e8 32 2b 3f fc       	call   0xfc3f2b41
   f:	85 db                	test   %ebx,%ebx
  11:	0f 84 ca 00 00 00    	je     0xe1
  17:	48 8b 44 24 58       	mov    0x58(%rsp),%rax
  1c:	48 8d 98 a0 00 00 00 	lea    0xa0(%rax),%rbx
  23:	48 89 d8             	mov    %rbx,%rax
  26:	48 c1 e8 03          	shr    $0x3,%rax
* 2a:	42 80 3c 28 00       	cmpb   $0x0,(%rax,%r13,1) <-- trapping instruction
  2f:	74 08                	je     0x39
  31:	48 89 df             	mov    %rbx,%rdi
  34:	e8 e8 ae a5 fc       	call   0xfca5af21
  39:	4c 89 7c 24 10       	mov    %r15,0x10(%rsp)
  3e:	48                   	rex.W
  3f:	8b                   	.byte 0x8b


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [syzbot] [block?] general protection fault in lo_rw_aio
  2026-04-18  0:02 [syzbot] [block?] general protection fault in lo_rw_aio syzbot
@ 2026-04-21 11:05 ` Tetsuo Handa
  2026-05-11 11:43   ` [PATCH] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq Tetsuo Handa
  0 siblings, 1 reply; 29+ messages in thread
From: Tetsuo Handa @ 2026-04-21 11:05 UTC (permalink / raw)
  To: syzbot, axboe, linux-block, syzkaller-bugs

I confirmed that this NULL pointer dereference is triggered by
__loop_clr_fd() clearing lo->lo_backing_file before aio request
is processed. A debug printk() patch at
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/block/loop.c?h=next-20260420&id=7454b88731fd320e7dbfbf1ea72f25705c3befc0
generated the following output.


file == NULL
Last __loop_clr_fd() call was 5 jiffies ago by udevd (16676)
     lo_release+0x35b/0x9d0
     blkdev_release+0x15/0x20 block/fops.c:705
     __fput+0x461/0xa70 fs/file_table.c:510
     fput_close_sync+0x11f/0x240 fs/file_table.c:615
     __do_sys_close fs/open.c:1507 [inline]
     __se_sys_close fs/open.c:1492 [inline]
     __x64_sys_close+0x7e/0x110 fs/open.c:1492
     do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
     do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000014: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x00000000000000a0-0x00000000000000a7]
CPU: 0 UID: 0 PID: 180 Comm: kworker/u8:6 Tainted: G             L      syzkaller #0 PREEMPT_{RT,(full)} 
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Workqueue: loop4 loop_workfn
RIP: 0010:file_inode include/linux/fs.h:1353 [inline]
RIP: 0010:kiocb_start_write include/linux/fs.h:2756 [inline]
RIP: 0010:lo_rw_aio+0xb88/0x1190 drivers/block/loop.c:429
Code: 1e d8 34 fc 45 85 e4 0f 84 d8 00 00 00 48 8b 44 24 58 48 8d 98 a0 00 00 00 48 89 d8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 df e8 8a 4c 9e fc 48 8b 1b 48 83 c3 28 49
RSP: 0018:ffffc90003907660 EFLAGS: 00010206
RAX: 0000000000000014 RBX: 00000000000000a0 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffffc900039077d0 R08: 0000000000000000 R09: 0000000000000000
R10: ffffc90003907723 R11: fffff52000720ee6 R12: 0000000000000001
R13: ffff888026ca2c90 R14: 0000000000000000 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff888125eb7000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055555b199a28 CR3: 000000003a2c0000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 do_req_filebacked drivers/block/loop.c:461 [inline]
 loop_handle_cmd drivers/block/loop.c:1960 [inline]
 loop_process_work+0x647/0x1560 drivers/block/loop.c:1995
 process_one_work+0x9a3/0x1710 kernel/workqueue.c:3312
 process_scheduled_works kernel/workqueue.c:3403 [inline]
 worker_thread+0xba8/0x11e0 kernel/workqueue.c:3489
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:file_inode include/linux/fs.h:1353 [inline]
RIP: 0010:kiocb_start_write include/linux/fs.h:2756 [inline]
RIP: 0010:lo_rw_aio+0xb88/0x1190 drivers/block/loop.c:429
Code: 1e d8 34 fc 45 85 e4 0f 84 d8 00 00 00 48 8b 44 24 58 48 8d 98 a0 00 00 00 48 89 d8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 df e8 8a 4c 9e fc 48 8b 1b 48 83 c3 28 49
RSP: 0018:ffffc90003907660 EFLAGS: 00010206
RAX: 0000000000000014 RBX: 00000000000000a0 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffffc900039077d0 R08: 0000000000000000 R09: 0000000000000000
R10: ffffc90003907723 R11: fffff52000720ee6 R12: 0000000000000001
R13: ffff888026ca2c90 R14: 0000000000000000 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff888125eb7000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055555b199a28 CR3: 000000003a2c0000 CR4: 00000000003526f0


Either fput() is called without corresponding fget(), or some recent changes
has changed timing of start processing aio request?

Since this problem started after the merge window opened, the culprit commit
might be in between commit a028739a4330 ("Merge tag 'block-7.0-20260305' of
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux") and
commit 7fe6ac157b7e ("Merge tag 'for-7.1/block-20260411' of
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux").
Also, syzbot was not testing changes in linux-next since next-20260403,
and found this problem in next-20260413.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq
  2026-04-21 11:05 ` Tetsuo Handa
@ 2026-05-11 11:43   ` Tetsuo Handa
  2026-05-11 15:58     ` Bart Van Assche
  0 siblings, 1 reply; 29+ messages in thread
From: Tetsuo Handa @ 2026-05-11 11:43 UTC (permalink / raw)
  To: Jens Axboe, linux-block, LKML, Christoph Hellwig, Bart Van Assche,
	Damien Le Moal

Summary:
This patch addresses a NULL pointer dereference in lo_rw_aio() by
introducing SRCU-based synchronization and explicit workqueue draining
during device release. This race appears to have been exacerbated or
introduced by recent changes in the block layer's request completion and
freezing logic.

Problem Description:
A NULL pointer dereference was reported by syzbot. The crash occurs when
lo_rw_aio() access lo->lo_backing_file which has already been cleared by
__loop_clr_fd().

The investigation suggests a gap between loop_queue_rq() and the driver's
internal workqueue. Even when the block layer attempts to freeze the queue,
requests that have already passed the loop_queue_rq() state check but have
not yet been queued to lo->workqueue can "leak" and execute after
lo_release() has proceeded to teardown the device.

Suspicious Commits and Behavioral Changes:
We suspect this race became visible due to behavioral changes in how the
block layer handles request completion and synchronization, specifically:

1. Commit 65565ca5f99b ("block: unify the synchronous bi_end_io
   callbacks"): This unified completion path might have altered the timing
   or the visibility of in-flight requests during a queue freeze, allowing
   lo_release() to proceed before the loop driver's internal asynchronous
   work has been fully accounted for.

2. Changes in blk_mq_freeze_queue(): In older kernels, the freeze mechanism
   might have more effectively covered the window between queue_rq and the
   driver's execution of that request. The current behavior seems to allow
   __loop_clr_fd() to run while loop_queue_rq() is still in the middle of
   scheduling work.

Stability and Backporting:
Because the underlying cause is tied to recent block layer refactoring,
this patch should not be backported to older stable kernels without careful
verification, as it may be unnecessary or lead to performance regressions
due to the added SRCU overhead.

Solution:
The patch closes the race window using SRCU:

* loop_queue_rq: Wrapped in srcu_read_lock() to ensure that once a request
  passes the Lo_bound check, the corresponding queue_work() must complete
  before the teardown path can finish its synchronization.

* lo_release: Calls synchronize_srcu() followed by drain_workqueue(). This
  sequence ensures:
  * No new work can be scheduled (lo_state change).
  * All ongoing scheduling calls have finished (synchronize_srcu).
  * All scheduled work has finished executing (drain_workqueue).
  * Finally, it is safe to clear lo_backing_file.

Trace Evidence:
Console logs with debug printk() patch confirm that __loop_clr_fd() has
cleared the file for loop3 between multiple lo_rw_aio() requests.

  [  122.956248][ T6148] loop3: detected capacity change from 0 to 32768
  [  122.958217][ T6142] lo_rw_aio(loop3) starting read with raw_refcnt=0x0, refcnt=1
  (...snipped...)
  [  123.234786][   T44] lo_rw_aio(loop3) starting read with raw_refcnt=0x0, refcnt=1
  [  123.254716][ T6148] __loop_clr_fd(loop3) clearing lo_backing_file with raw_refcnt=0x0, refcnt=1
  [  123.265134][  T180] lo_rw_aio(loop3) starting write with NULL file (already cleared?)
  [  123.265221][  T180] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000014: 0000 [#1] SMP KASAN PTI
  [  123.265238][  T180] KASAN: null-ptr-deref in range [0x00000000000000a0-0x00000000000000a7]
  [  123.265255][  T180] CPU: 0 UID: 0 PID: 180 Comm: kworker/u8:7 Not tainted syzkaller #0 PREEMPT_{RT,(full)} 
  [  123.265276][  T180] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
  [  123.265287][  T180] Workqueue: loop3 loop_workfn
  [  123.265320][  T180] RIP: 0010:lo_rw_aio+0xd1d/0x1170

Reported-by: syzbot+cd8a9a308e879a4e2c28@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=cd8a9a308e879a4e2c28
Analyzed-by: AI Mode in Google Search (no mail address)
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
Since this race condition is difficult to reproduce, we can't do bisection.
I hope you can figure out what has changed in the block layer for this merge window.
You might want to revert instead of modifying the loop driver.

 drivers/block/loop.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0000913f7efc..9be47ce97dab 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -93,6 +93,7 @@ struct loop_cmd {
 static DEFINE_IDR(loop_index_idr);
 static DEFINE_MUTEX(loop_ctl_mutex);
 static DEFINE_MUTEX(loop_validate_mutex);
+DEFINE_SRCU(loop_io_srcu);
 
 /**
  * loop_global_lock_killable() - take locks for safe loop_validate_file() test
@@ -1747,8 +1748,19 @@ static void lo_release(struct gendisk *disk)
 	need_clear = (lo->lo_state == Lo_rundown);
 	mutex_unlock(&lo->lo_mutex);
 
-	if (need_clear)
+	if (need_clear) {
+		/*
+		 * Now that loop_queue_rq() sees lo->lo_state != Lo_bound,
+		 * wait for already started loop_queue_rq() to complete.
+		 */
+		synchronize_srcu(&loop_io_srcu);
+		/*
+		 * Now that no more works are scheduled by loop_queue_rq(),
+		 * wait for already scheduled works to complete.
+		 */
+		drain_workqueue(lo->workqueue);
 		__loop_clr_fd(lo);
+	}
 }
 
 static void lo_free_disk(struct gendisk *disk)
@@ -1854,11 +1866,15 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct request *rq = bd->rq;
 	struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
 	struct loop_device *lo = rq->q->queuedata;
+	int idx;
 
 	blk_mq_start_request(rq);
 
-	if (data_race(READ_ONCE(lo->lo_state)) != Lo_bound)
+	idx = srcu_read_lock(&loop_io_srcu);
+	if (data_race(READ_ONCE(lo->lo_state)) != Lo_bound) {
+		srcu_read_unlock(&loop_io_srcu, idx);
 		return BLK_STS_IOERR;
+	}
 
 	switch (req_op(rq)) {
 	case REQ_OP_FLUSH:
@@ -1888,6 +1904,7 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 #endif
 	loop_queue_work(lo, cmd);
 
+	srcu_read_unlock(&loop_io_srcu, idx);
 	return BLK_STS_OK;
 }
 
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq
  2026-05-11 11:43   ` [PATCH] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq Tetsuo Handa
@ 2026-05-11 15:58     ` Bart Van Assche
  2026-05-11 17:43       ` Tetsuo Handa
  0 siblings, 1 reply; 29+ messages in thread
From: Bart Van Assche @ 2026-05-11 15:58 UTC (permalink / raw)
  To: Tetsuo Handa, Jens Axboe, linux-block, LKML, Christoph Hellwig,
	Damien Le Moal

On 5/11/26 4:43 AM, Tetsuo Handa wrote:
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 0000913f7efc..9be47ce97dab 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -93,6 +93,7 @@ struct loop_cmd {
>   static DEFINE_IDR(loop_index_idr);
>   static DEFINE_MUTEX(loop_ctl_mutex);
>   static DEFINE_MUTEX(loop_validate_mutex);
> +DEFINE_SRCU(loop_io_srcu);
>   
>   /**
>    * loop_global_lock_killable() - take locks for safe loop_validate_file() test
> @@ -1747,8 +1748,19 @@ static void lo_release(struct gendisk *disk)
>   	need_clear = (lo->lo_state == Lo_rundown);
>   	mutex_unlock(&lo->lo_mutex);
>   
> -	if (need_clear)
> +	if (need_clear) {
> +		/*
> +		 * Now that loop_queue_rq() sees lo->lo_state != Lo_bound,
> +		 * wait for already started loop_queue_rq() to complete.
> +		 */
> +		synchronize_srcu(&loop_io_srcu);
> +		/*
> +		 * Now that no more works are scheduled by loop_queue_rq(),
> +		 * wait for already scheduled works to complete.
> +		 */
> +		drain_workqueue(lo->workqueue);
>   		__loop_clr_fd(lo);
> +	}
>   }

There is already a mechanism in the block layer to wait for pending
.queue_rq() calls to complete. Please take a look at
blk_mq_quiesce_queue().

>   static void lo_free_disk(struct gendisk *disk)
> @@ -1854,11 +1866,15 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
>   	struct request *rq = bd->rq;
>   	struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
>   	struct loop_device *lo = rq->q->queuedata;
> +	int idx;
>   
>   	blk_mq_start_request(rq);
>   
> -	if (data_race(READ_ONCE(lo->lo_state)) != Lo_bound)
> +	idx = srcu_read_lock(&loop_io_srcu);
> +	if (data_race(READ_ONCE(lo->lo_state)) != Lo_bound) {
> +		srcu_read_unlock(&loop_io_srcu, idx);
>   		return BLK_STS_IOERR;
> +	}
>   
>   	switch (req_op(rq)) {
>   	case REQ_OP_FLUSH:
> @@ -1888,6 +1904,7 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
>   #endif
>   	loop_queue_work(lo, cmd);
>   
> +	srcu_read_unlock(&loop_io_srcu, idx);
>   	return BLK_STS_OK;
>   }

Why SRCU instead of RCU? The loop driver doesn't set BLK_MQ_F_BLOCKING
and hence must not sleep inside loop_queue_rq(). Additionally, the block
layer already holds an RCU lock around all loop_queue_rq() calls. From
block/blk-mq.h:

/* run the code block in @dispatch_ops with rcu/srcu read lock held */
#define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops)	\
do {								\
	if ((q)->tag_set->flags & BLK_MQ_F_BLOCKING) {		\
		struct blk_mq_tag_set *__tag_set = (q)->tag_set; \
		int srcu_idx;					\
								\
		might_sleep_if(check_sleep);			\
		srcu_idx = srcu_read_lock(__tag_set->srcu);	\
		(dispatch_ops);					\
		srcu_read_unlock(__tag_set->srcu, srcu_idx);	\
	} else {						\
		rcu_read_lock();				\
		(dispatch_ops);					\
		rcu_read_unlock();				\
	}							\
} while (0)

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq
  2026-05-11 15:58     ` Bart Van Assche
@ 2026-05-11 17:43       ` Tetsuo Handa
  2026-05-12 11:46         ` Tetsuo Handa
  0 siblings, 1 reply; 29+ messages in thread
From: Tetsuo Handa @ 2026-05-11 17:43 UTC (permalink / raw)
  To: Bart Van Assche, Andrew Morton
  Cc: Jens Axboe, linux-block, LKML, Christoph Hellwig, Damien Le Moal

Thank you for responding.

Given it is protected by RCU, this might be yet another manifestation of
"is trying to release lock (rcu_read_lock) at:" + "but there are no more locks to release!"-type of
hidden RCU imbalance bugs recently introduced? Andrew, was any progress made for RCU imbalance bugs?

On 2026/04/21 20:05, Tetsuo Handa wrote:
> Since this problem started after the merge window opened, the culprit commit
> might be in between commit a028739a4330 ("Merge tag 'block-7.0-20260305' of
> git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux") and
> commit 7fe6ac157b7e ("Merge tag 'for-7.1/block-20260411' of
> git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux").
> Also, syzbot was not testing changes in linux-next since next-20260403,
> and found this problem in next-20260413.

On 2026/05/12 0:58, Bart Van Assche wrote:
> Why SRCU instead of RCU? The loop driver doesn't set BLK_MQ_F_BLOCKING
> and hence must not sleep inside loop_queue_rq(). Additionally, the block
> layer already holds an RCU lock around all loop_queue_rq() calls. From
> block/blk-mq.h:
> 
> /* run the code block in @dispatch_ops with rcu/srcu read lock held */
> #define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops)    \
> do {                                \
>     if ((q)->tag_set->flags & BLK_MQ_F_BLOCKING) {        \
>         struct blk_mq_tag_set *__tag_set = (q)->tag_set; \
>         int srcu_idx;                    \
>                                 \
>         might_sleep_if(check_sleep);            \
>         srcu_idx = srcu_read_lock(__tag_set->srcu);    \
>         (dispatch_ops);                    \
>         srcu_read_unlock(__tag_set->srcu, srcu_idx);    \
>     } else {                        \
>         rcu_read_lock();                \
>         (dispatch_ops);                    \
>         rcu_read_unlock();                \
>     }                            \
> } while (0)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq
  2026-05-11 17:43       ` Tetsuo Handa
@ 2026-05-12 11:46         ` Tetsuo Handa
  2026-05-15  1:38           ` [PATCH v2] " Tetsuo Handa
  0 siblings, 1 reply; 29+ messages in thread
From: Tetsuo Handa @ 2026-05-12 11:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, LKML, Christoph Hellwig, Damien Le Moal,
	Andrew Morton

Commit 99ebc509eef5 ("mm: memcontrol: fix rcu unbalance in get_non_dying_memcg_end()") fixed
RCU imbalance bug, and it seems that so far only trees without that commit is reproducing RCU
imbalance bug.

But since this NULL pointer dereference bug was reproduced in linux-next-20260508 which already
includes that commit, I suspect that some change in the block layer broke protection by RCU
synchronization or flushing of lo->workqueue.

Can you check that there was no commit that might break protection by RCU synchronization or
flushing of lo->workqueue?

On 2026/05/12 2:43, Tetsuo Handa wrote:
> Thank you for responding.
> 
> Given it is protected by RCU, this might be yet another manifestation of
> "is trying to release lock (rcu_read_lock) at:" + "but there are no more locks to release!"-type of
> hidden RCU imbalance bugs recently introduced? Andrew, was any progress made for RCU imbalance bugs?
> 
> On 2026/04/21 20:05, Tetsuo Handa wrote:
>> Since this problem started after the merge window opened, the culprit commit
>> might be in between commit a028739a4330 ("Merge tag 'block-7.0-20260305' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux") and
>> commit 7fe6ac157b7e ("Merge tag 'for-7.1/block-20260411' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux").
>> Also, syzbot was not testing changes in linux-next since next-20260403,
>> and found this problem in next-20260413.
> 
> On 2026/05/12 0:58, Bart Van Assche wrote:
>> Why SRCU instead of RCU? The loop driver doesn't set BLK_MQ_F_BLOCKING
>> and hence must not sleep inside loop_queue_rq(). Additionally, the block
>> layer already holds an RCU lock around all loop_queue_rq() calls. From
>> block/blk-mq.h:
>>
>> /* run the code block in @dispatch_ops with rcu/srcu read lock held */
>> #define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops)    \
>> do {                                \
>>     if ((q)->tag_set->flags & BLK_MQ_F_BLOCKING) {        \
>>         struct blk_mq_tag_set *__tag_set = (q)->tag_set; \
>>         int srcu_idx;                    \
>>                                 \
>>         might_sleep_if(check_sleep);            \
>>         srcu_idx = srcu_read_lock(__tag_set->srcu);    \
>>         (dispatch_ops);                    \
>>         srcu_read_unlock(__tag_set->srcu, srcu_idx);    \
>>     } else {                        \
>>         rcu_read_lock();                \
>>         (dispatch_ops);                    \
>>         rcu_read_unlock();                \
>>     }                            \
>> } while (0)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq
  2026-05-12 11:46         ` Tetsuo Handa
@ 2026-05-15  1:38           ` Tetsuo Handa
  2026-05-19  0:40             ` Andrew Morton
  0 siblings, 1 reply; 29+ messages in thread
From: Tetsuo Handa @ 2026-05-15  1:38 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe, Christoph Hellwig, Damien Le Moal
  Cc: linux-block, LKML, Andrew Morton

The loop driver relies on lo_release() to automatically clear the loop
device via __loop_clr_fd() when the last file descriptor is closed
(LO_FLAGS_AUTOCLEAR). Although the backing file structure itself remains
allocated in memory thanks to proper file reference counting (f_count is
not zero), a severe race condition exists regarding the visibility of
the lo->lo_backing_file pointer.

This race window was exposed by commit 65565ca5f99b ("block: unify
the synchronous bi_end_io callbacks"). By unifying and optimizing
the synchronous I/O completion path, the timing and scheduling behavior of
the block layer altered significantly.
As a result, a highly-concurrent execution pipeline emerged where
lo_release() can progress to __loop_clr_fd() and nullify
lo->lo_backing_file while an already-scheduled asynchronous I/O work
(lo_rw_aio) is just about to be executed by a kworker thread.

Since the kworker enters lo_rw_aio() after lo->lo_backing_file has been
cleared, it attempts to dereference the now-NULL pointer when initializing
the kiocb, leading to the reported NULL pointer dereference bug.

To close this race safely without introducing heavy fast-path checks,
we must ensure that any running or scheduled dispatch threads have
completed before we nullify the pointer. Since loop_queue_rq() operates
within the block layer's RCU read-side critical section, invoke
synchronize_rcu() and drain_workqueue() in __loop_clr_fd() prior to
clearing lo->lo_backing_file.

Reported-by: syzbot+cd8a9a308e879a4e2c28@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=cd8a9a308e879a4e2c28
Reported-by: syzbot+bc273027d5643e48e5b3@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=bc273027d5643e48e5b3
Analyzed-by: AI Mode in Google Search (no mail address)
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/block/loop.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0000913f7efc..ff117f340b2f 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1118,6 +1118,17 @@ static void __loop_clr_fd(struct loop_device *lo)
 	struct file *filp;
 	gfp_t gfp = lo->old_gfp_mask;
 
+	/*
+	 * Now that loop_queue_rq() sees lo->lo_state != Lo_bound,
+	 * wait for already started loop_queue_rq() to complete.
+	 */
+	synchronize_rcu();
+	/*
+	 * Now that no more works are scheduled by loop_queue_rq(),
+	 * wait for already scheduled works to complete.
+	 */
+	drain_workqueue(lo->workqueue);
+
 	spin_lock_irq(&lo->lo_lock);
 	filp = lo->lo_backing_file;
 	lo->lo_backing_file = NULL;
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq
  2026-05-15  1:38           ` [PATCH v2] " Tetsuo Handa
@ 2026-05-19  0:40             ` Andrew Morton
  2026-05-19  9:27               ` Tetsuo Handa
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2026-05-19  0:40 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Bart Van Assche, Jens Axboe, Christoph Hellwig, Damien Le Moal,
	linux-block, LKML

On Fri, 15 May 2026 10:38:36 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:

> The loop driver relies on lo_release() to automatically clear the loop
> device via __loop_clr_fd() when the last file descriptor is closed
> (LO_FLAGS_AUTOCLEAR). Although the backing file structure itself remains
> allocated in memory thanks to proper file reference counting (f_count is
> not zero), a severe race condition exists regarding the visibility of
> the lo->lo_backing_file pointer.
> 
> This race window was exposed by commit 65565ca5f99b ("block: unify
> the synchronous bi_end_io callbacks"). By unifying and optimizing
> the synchronous I/O completion path, the timing and scheduling behavior of
> the block layer altered significantly.
> As a result, a highly-concurrent execution pipeline emerged where
> lo_release() can progress to __loop_clr_fd() and nullify
> lo->lo_backing_file while an already-scheduled asynchronous I/O work
> (lo_rw_aio) is just about to be executed by a kworker thread.
> 
> Since the kworker enters lo_rw_aio() after lo->lo_backing_file has been
> cleared, it attempts to dereference the now-NULL pointer when initializing
> the kiocb, leading to the reported NULL pointer dereference bug.
> 
> To close this race safely without introducing heavy fast-path checks,
> we must ensure that any running or scheduled dispatch threads have
> completed before we nullify the pointer. Since loop_queue_rq() operates
> within the block layer's RCU read-side critical section, invoke
> synchronize_rcu() and drain_workqueue() in __loop_clr_fd() prior to
> clearing lo->lo_backing_file.

AI review asked a couple of questions:
	https://sashiko.dev/#/patchset/9b2032d6-3f36-4d2b-8128-985c08a4fa37@I-love.SAKURA.ne.jp

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq
  2026-05-19  0:40             ` Andrew Morton
@ 2026-05-19  9:27               ` Tetsuo Handa
  2026-05-20  3:06                 ` Ming Lei
  0 siblings, 1 reply; 29+ messages in thread
From: Tetsuo Handa @ 2026-05-19  9:27 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe, Christoph Hellwig, Damien Le Moal
  Cc: linux-block, LKML, Andrew Morton

On 2026/05/19 9:40, Andrew Morton wrote:
> AI review asked a couple of questions:
> 	https://sashiko.dev/#/patchset/9b2032d6-3f36-4d2b-8128-985c08a4fa37@I-love.SAKURA.ne.jp

To: gemini/gemini-3.1-pro-preview

Thank you for your valuable feedback. Your point about asynchronous I/O completing after drain_workqueue()
and potentially causing a UAF at file_inode() from kiocb_end_write() from lo_rw_aio_do_completion() is correct.
The drain_workqueue() alone does not wait for in-flight AIOs that have already returned -EIOCBQUEUED. However,
I'm not convinced that use of blk_mq_freeze_queue() inside __loop_clr_fd() where disk->open_mutex was already
held by bdev_release() is absolutely deadlock-free.

1. VFS and Block Layer Lock Contention:
   __loop_clr_fd() is exclusively invoked from the lo_release() path during the final close of the device.
   At this stage, the block layer is holding disk->open_mutex. If we call blk_mq_freeze_queue() here, it will
   synchronously wait for all in-flight AIOs to complete. However, the completion paths of those in-flight AIOs
   (or subsequent metadata processing in the underlying filesystem) may attempt to acquire resources or execute
   code paths that depend on the very same device state or open/close status. This creates a circular dependency,
   leading to an unrecoverable hang.

2. Memory Reclaim Deadlock:
   blk_mq_freeze_queue() blocks until the queue's usage counter drops to zero. If an in-flight AIO requires memory
   allocation for metadata updates upon completion, and the system is under heavy memory pressure, it can trigger
   direct memory reclaim. If the reclaim path attempts to sync other buffers or interact with the frozen loop
   device/queue, a circular deadlock occurs.

Therefore, I would like to choose SRCU-based synchronization instead of blk_mq_freeze_queue().

* Locking: We call srcu_read_lock(&loop_io_srcu) only for asynchronous paths (cmd->use_aio) immediately
  before submitting the I/O to the underlying filesystem in lo_rw_aio().

* Unlocking: The reader lock is released via srcu_read_unlock() at the very end of the AIO completion handler
  (lo_rw_aio_do_completion()).

* Synchronization: We place synchronize_srcu(&loop_io_srcu) immediately after drain_workqueue() in __loop_clr_fd().

I think that this guarantees that __loop_clr_fd() safely blocks until all pending AIO callbacks are 100% completed,
fully eliminating the UAF risk and ensuring the safety of the subsequent mapping_set_gfp_mask() and fput(), while
remaining entirely deadlock-free.

What do you think about this approach?

 drivers/block/loop.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0000913f7efc..7c3961f3cbc9 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -80,6 +80,7 @@ struct loop_cmd {
 	struct list_head list_entry;
 	bool use_aio; /* use AIO interface to handle I/O */
 	atomic_t ref; /* only for aio */
+	int srcu_idx;
 	long ret;
 	struct kiocb iocb;
 	struct bio_vec *bvec;
@@ -93,6 +94,7 @@ struct loop_cmd {
 static DEFINE_IDR(loop_index_idr);
 static DEFINE_MUTEX(loop_ctl_mutex);
 static DEFINE_MUTEX(loop_validate_mutex);
+DEFINE_SRCU(loop_io_srcu);
 
 /**
  * loop_global_lock_killable() - take locks for safe loop_validate_file() test
@@ -327,6 +329,8 @@ static void lo_rw_aio_do_completion(struct loop_cmd *cmd)
 		kiocb_end_write(&cmd->iocb);
 	if (likely(!blk_should_fake_timeout(rq->q)))
 		blk_mq_complete_request(rq);
+	if (cmd->use_aio)
+		srcu_read_unlock(&loop_io_srcu, cmd->srcu_idx);
 }
 
 static void lo_rw_aio_complete(struct kiocb *iocb, long ret)
@@ -392,6 +396,7 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 	if (cmd->use_aio) {
 		cmd->iocb.ki_complete = lo_rw_aio_complete;
 		cmd->iocb.ki_flags = IOCB_DIRECT;
+		cmd->srcu_idx = srcu_read_lock(&loop_io_srcu);
 	} else {
 		cmd->iocb.ki_complete = NULL;
 		cmd->iocb.ki_flags = 0;
@@ -1118,6 +1123,22 @@ static void __loop_clr_fd(struct loop_device *lo)
 	struct file *filp;
 	gfp_t gfp = lo->old_gfp_mask;
 
+	/*
+	 * Now that loop_queue_rq() sees lo->lo_state != Lo_bound,
+	 * wait for already started loop_queue_rq() to complete.
+	 */
+	synchronize_rcu();
+	/*
+	 * Now that no more works are scheduled by loop_queue_rq(),
+	 * wait for already scheduled works to complete.
+	 */
+	drain_workqueue(lo->workqueue);
+	/*
+	 * Now that no more AIO requests are scheduled by lo_rw_aio(),
+	 * wait for already started AIO to complete.
+	 */
+	synchronize_srcu(&loop_io_srcu);
+
 	spin_lock_irq(&lo->lo_lock);
 	filp = lo->lo_backing_file;
 	lo->lo_backing_file = NULL;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq
  2026-05-19  9:27               ` Tetsuo Handa
@ 2026-05-20  3:06                 ` Ming Lei
  2026-05-20  6:36                   ` Tetsuo Handa
  0 siblings, 1 reply; 29+ messages in thread
From: Ming Lei @ 2026-05-20  3:06 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Bart Van Assche, Jens Axboe, Christoph Hellwig, Damien Le Moal,
	linux-block, LKML, Andrew Morton

On Tue, May 19, 2026 at 06:27:11PM +0900, Tetsuo Handa wrote:
> On 2026/05/19 9:40, Andrew Morton wrote:
> > AI review asked a couple of questions:
> > 	https://sashiko.dev/#/patchset/9b2032d6-3f36-4d2b-8128-985c08a4fa37@I-love.SAKURA.ne.jp
> 
> To: gemini/gemini-3.1-pro-preview
> 
> Thank you for your valuable feedback. Your point about asynchronous I/O completing after drain_workqueue()
> and potentially causing a UAF at file_inode() from kiocb_end_write() from lo_rw_aio_do_completion() is correct.
> The drain_workqueue() alone does not wait for in-flight AIOs that have already returned -EIOCBQUEUED. However,
> I'm not convinced that use of blk_mq_freeze_queue() inside __loop_clr_fd() where disk->open_mutex was already
> held by bdev_release() is absolutely deadlock-free.
> 
> 1. VFS and Block Layer Lock Contention:
>    __loop_clr_fd() is exclusively invoked from the lo_release() path during the final close of the device.
>    At this stage, the block layer is holding disk->open_mutex. If we call blk_mq_freeze_queue() here, it will
>    synchronously wait for all in-flight AIOs to complete. However, the completion paths of those in-flight AIOs
>    (or subsequent metadata processing in the underlying filesystem) may attempt to acquire resources or execute
>    code paths that depend on the very same device state or open/close status. This creates a circular dependency,
>    leading to an unrecoverable hang.
> 
> 2. Memory Reclaim Deadlock:
>    blk_mq_freeze_queue() blocks until the queue's usage counter drops to zero. If an in-flight AIO requires memory
>    allocation for metadata updates upon completion, and the system is under heavy memory pressure, it can trigger
>    direct memory reclaim. If the reclaim path attempts to sync other buffers or interact with the frozen loop
>    device/queue, a circular deadlock occurs.
> 
> Therefore, I would like to choose SRCU-based synchronization instead of blk_mq_freeze_queue().
> 
> * Locking: We call srcu_read_lock(&loop_io_srcu) only for asynchronous paths (cmd->use_aio) immediately
>   before submitting the I/O to the underlying filesystem in lo_rw_aio().
> 
> * Unlocking: The reader lock is released via srcu_read_unlock() at the very end of the AIO completion handler
>   (lo_rw_aio_do_completion()).
> 
> * Synchronization: We place synchronize_srcu(&loop_io_srcu) immediately after drain_workqueue() in __loop_clr_fd().
> 
> I think that this guarantees that __loop_clr_fd() safely blocks until all pending AIO callbacks are 100% completed,
> fully eliminating the UAF risk and ensuring the safety of the subsequent mapping_set_gfp_mask() and fput(), while
> remaining entirely deadlock-free.
> 
> What do you think about this approach?
> 
>  drivers/block/loop.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 0000913f7efc..7c3961f3cbc9 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -80,6 +80,7 @@ struct loop_cmd {
>  	struct list_head list_entry;
>  	bool use_aio; /* use AIO interface to handle I/O */
>  	atomic_t ref; /* only for aio */
> +	int srcu_idx;
>  	long ret;
>  	struct kiocb iocb;
>  	struct bio_vec *bvec;
> @@ -93,6 +94,7 @@ struct loop_cmd {
>  static DEFINE_IDR(loop_index_idr);
>  static DEFINE_MUTEX(loop_ctl_mutex);
>  static DEFINE_MUTEX(loop_validate_mutex);
> +DEFINE_SRCU(loop_io_srcu);
>  
>  /**
>   * loop_global_lock_killable() - take locks for safe loop_validate_file() test
> @@ -327,6 +329,8 @@ static void lo_rw_aio_do_completion(struct loop_cmd *cmd)
>  		kiocb_end_write(&cmd->iocb);
>  	if (likely(!blk_should_fake_timeout(rq->q)))
>  		blk_mq_complete_request(rq);
> +	if (cmd->use_aio)
> +		srcu_read_unlock(&loop_io_srcu, cmd->srcu_idx);
>  }
>  
>  static void lo_rw_aio_complete(struct kiocb *iocb, long ret)
> @@ -392,6 +396,7 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
>  	if (cmd->use_aio) {
>  		cmd->iocb.ki_complete = lo_rw_aio_complete;
>  		cmd->iocb.ki_flags = IOCB_DIRECT;
> +		cmd->srcu_idx = srcu_read_lock(&loop_io_srcu);
>  	} else {
>  		cmd->iocb.ki_complete = NULL;
>  		cmd->iocb.ki_flags = 0;
> @@ -1118,6 +1123,22 @@ static void __loop_clr_fd(struct loop_device *lo)
>  	struct file *filp;
>  	gfp_t gfp = lo->old_gfp_mask;
>  
> +	/*
> +	 * Now that loop_queue_rq() sees lo->lo_state != Lo_bound,
> +	 * wait for already started loop_queue_rq() to complete.
> +	 */
> +	synchronize_rcu();
> +	/*
> +	 * Now that no more works are scheduled by loop_queue_rq(),
> +	 * wait for already scheduled works to complete.
> +	 */
> +	drain_workqueue(lo->workqueue);
> +	/*
> +	 * Now that no more AIO requests are scheduled by lo_rw_aio(),
> +	 * wait for already started AIO to complete.
> +	 */
> +	synchronize_srcu(&loop_io_srcu);

The IO after close(loop) should be from writeback. rcu/sruc isn't necessary,
please see the patch posted in another thread:

https://lore.kernel.org/linux-block/agxJdUf1b0JSDAux@fedora/

in which the check on lo->lo_state is moved to loop_handle_cmd(), meantime
drain_workqueue() is added for draining in-flight workers.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq
  2026-05-20  3:06                 ` Ming Lei
@ 2026-05-20  6:36                   ` Tetsuo Handa
  2026-05-20  7:49                     ` Ming Lei
  0 siblings, 1 reply; 29+ messages in thread
From: Tetsuo Handa @ 2026-05-20  6:36 UTC (permalink / raw)
  To: Ming Lei
  Cc: Bart Van Assche, Jens Axboe, Christoph Hellwig, Damien Le Moal,
	linux-block, LKML, Andrew Morton

On 2026/05/20 12:06, Ming Lei wrote:
> The IO after close(loop) should be from writeback. rcu/sruc isn't necessary,

Gemini's comment is that drain_workqueue() is not sufficient for waiting for
do_req_filebacked(REQ_OP_WRITE) requests with cmd->use_aio == true case to complete.

We could remove synchronize_rcu() prior to drain_workqueue() if we defer
lo->lo_state != Lo_bound check to workqueue context (or recheck in workqueue context).

But I still think that we need to guarantee that all "cmd->use_aio == true" requests (including
ones which had been issued before hitting "WRITE_ONCE(lo->lo_state, Lo_rundown);") have
completed before doing "lo->lo_backing_file = NULL;".

And I don't know whether it is safe to use
"blk_mq_unfreeze_queue(lo->lo_queue, blk_mq_freeze_queue(lo->lo_queue));"
immediately after drain_workqueue() because we are holding disk->open_mutex.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq
  2026-05-20  6:36                   ` Tetsuo Handa
@ 2026-05-20  7:49                     ` Ming Lei
  2026-05-20  8:20                       ` Tetsuo Handa
  0 siblings, 1 reply; 29+ messages in thread
From: Ming Lei @ 2026-05-20  7:49 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Bart Van Assche, Jens Axboe, Christoph Hellwig, Damien Le Moal,
	linux-block, LKML, Andrew Morton

On Wed, May 20, 2026 at 03:36:12PM +0900, Tetsuo Handa wrote:
> On 2026/05/20 12:06, Ming Lei wrote:
> > The IO after close(loop) should be from writeback. rcu/sruc isn't necessary,
> 
> Gemini's comment is that drain_workqueue() is not sufficient for waiting for
> do_req_filebacked(REQ_OP_WRITE) requests with cmd->use_aio == true case to complete.

Anything cleared in __loop_clr_fd() is not used by lo_rw_aio_complete() & lo_complete_rq().

So why isn't drain_workqueue() enough for cmd->use_aio?


Thanks,
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq
  2026-05-20  7:49                     ` Ming Lei
@ 2026-05-20  8:20                       ` Tetsuo Handa
  2026-05-20  8:54                         ` Ming Lei
  0 siblings, 1 reply; 29+ messages in thread
From: Tetsuo Handa @ 2026-05-20  8:20 UTC (permalink / raw)
  To: Ming Lei
  Cc: Bart Van Assche, Jens Axboe, Christoph Hellwig, Damien Le Moal,
	linux-block, LKML, Andrew Morton

On 2026/05/20 16:49, Ming Lei wrote:
> On Wed, May 20, 2026 at 03:36:12PM +0900, Tetsuo Handa wrote:
>> On 2026/05/20 12:06, Ming Lei wrote:
>>> The IO after close(loop) should be from writeback. rcu/sruc isn't necessary,
>>
>> Gemini's comment is that drain_workqueue() is not sufficient for waiting for
>> do_req_filebacked(REQ_OP_WRITE) requests with cmd->use_aio == true case to complete.
> 
> Anything cleared in __loop_clr_fd() is not used by lo_rw_aio_complete() & lo_complete_rq().

"struct inode *inode = file_inode(iocb->ki_filp);" in kiocb_end_write() from
lo_rw_aio_do_completion() can dereference "struct file *" with refcount == 0 (UAF)
because fput() in __loop_clr_fd() can be the last reference to that file.

> 
> So why isn't drain_workqueue() enough for cmd->use_aio?

In addition to possible UAF above, the assumption at
https://elixir.bootlin.com/linux/v7.1-rc4/source/drivers/block/loop.c#L1134
is currently broken due to this race problem.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq
  2026-05-20  8:20                       ` Tetsuo Handa
@ 2026-05-20  8:54                         ` Ming Lei
  2026-05-25  3:40                           ` [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() Tetsuo Handa
  0 siblings, 1 reply; 29+ messages in thread
From: Ming Lei @ 2026-05-20  8:54 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Bart Van Assche, Jens Axboe, Christoph Hellwig, Damien Le Moal,
	linux-block, LKML, Andrew Morton

On Wed, May 20, 2026 at 05:20:01PM +0900, Tetsuo Handa wrote:
> On 2026/05/20 16:49, Ming Lei wrote:
> > On Wed, May 20, 2026 at 03:36:12PM +0900, Tetsuo Handa wrote:
> >> On 2026/05/20 12:06, Ming Lei wrote:
> >>> The IO after close(loop) should be from writeback. rcu/sruc isn't necessary,
> >>
> >> Gemini's comment is that drain_workqueue() is not sufficient for waiting for
> >> do_req_filebacked(REQ_OP_WRITE) requests with cmd->use_aio == true case to complete.
> > 
> > Anything cleared in __loop_clr_fd() is not used by lo_rw_aio_complete() & lo_complete_rq().
> 
> "struct inode *inode = file_inode(iocb->ki_filp);" in kiocb_end_write() from
> lo_rw_aio_do_completion() can dereference "struct file *" with refcount == 0 (UAF)
> because fput() in __loop_clr_fd() can be the last reference to that file.

OK, you are right.

It can be handled by adding sync_blockdev(lo->lo_device) in __loop_clr_fd()
because IO can be from writeback only when loop disk is closed.

Please feel free to test the following change:

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0000913f7efc..bbd15974a082 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1118,6 +1118,8 @@ static void __loop_clr_fd(struct loop_device *lo)
        struct file *filp;
        gfp_t gfp = lo->old_gfp_mask;

+       sync_blockdev(lo->lo_device);
+
        spin_lock_irq(&lo->lo_lock);
        filp = lo->lo_backing_file;
        lo->lo_backing_file = NULL;

> > 
> > So why isn't drain_workqueue() enough for cmd->use_aio?
> 
> In addition to possible UAF above, the assumption at
> https://elixir.bootlin.com/linux/v7.1-rc4/source/drivers/block/loop.c#L1134
> is currently broken due to this race problem.

That shouldn't be one issue otherwise bio based driver can't work with updating
limits.


Thanks, 
Ming

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
  2026-05-20  8:54                         ` Ming Lei
@ 2026-05-25  3:40                           ` Tetsuo Handa
  2026-05-25 15:19                             ` Ming Lei
  0 siblings, 1 reply; 29+ messages in thread
From: Tetsuo Handa @ 2026-05-25  3:40 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: Bart Van Assche, Christoph Hellwig, Damien Le Moal, linux-block,
	LKML, Andrew Morton

Some commit which was merged in the merge window for 7.1 broke the loop
driver; a race window where lo_release() clears the backing file via
__loop_clr_fd() despite some I/O requests are pending was introduced [1][2].

The exact commit which changed the behavior is not known due to lack of
reproducer and timing dependent behavior, but it seems that we need to
solve this problem in the loop driver despite there was no change for the
loop driver during this merge window.

To close this race, try to flush pending I/O requests. However, calling
drain_workqueue() from __loop_clr_fd() with disk->open_mutex held causes
lockdep warnings [3][4]. We need to flush pending I/O requests without
disk->open_mutex held.

In the past, commit 322c4293ecc5 ("loop: make autoclear operation
asynchronous") has tried to defer __loop_clr_fd() to WQ context. But it was
reverted by commit bf23747ee053 ("loop: revert "make autoclear operation
asynchronous"") because userspace might be expecting that fput() on the
backing file is processed before lo_release() from close() returns to user
mode.

Therefore, this patch tries to defer __loop_clr_fd() to task work context.
__loop_clr_fd() is split into three steps:

  Step 1: Flush pending I/O requests without holding disk->open_mutex.

  Step 2: Do what __loop_clr_fd() from lo_release() was doing with
          disk->open_mutex held.

  Step 3: Drop refcounts without holding disk->open_mutex.

A potential side effect of this approach is that a userspace program who
issued open() request before __loop_clr_fd() completes might be confused
by observing -ENXIO because lo_open() can be called before __loop_clr_fd()
completes.

Except for the side effect above, I expect this patch to work by the
following reasons.

- The existing Lo_rundown state safely guarantees that any subsequent
  lo_open() attempts will immediately fail with -ENXIO, preventing races
  even after disk->open_mutex is temporarily released.

- Since returning from lo_release() normally allows the block layer to
  immediately drop module and device references, this patch explicitly
  increments the refcounts (__module_get() and get_device()) before
  deferring the work, and safely releases them at the end of Step 3
  inside __loop_clr_fd().

- It prefers task_work so that userspace processes expecting immediate
  completion (such as fput() side-effects) receive a deterministic
  behavior before returning from close(). It falls back to schedule_work()
  if the current context is a kernel thread (PF_KTHREAD) or if
  task_work_add() fails.

Link: https://syzkaller.appspot.com/bug?extid=cd8a9a308e879a4e2c28 [1]
Link: https://syzkaller.appspot.com/bug?extid=bc273027d5643e48e5b3 [2]
Link: https://syzkaller.appspot.com/bug?extid=2f62807dc3239b8f584e [3]
Link: https://syzkaller.appspot.com/bug?extid=c4e9d077bcc86bee08dc [4]
Analyzed-by: AI Mode in Google Search (no mail address)
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/block/loop.c | 86 ++++++++++++++++++++++++++++++++++++--------
 kernel/task_work.c   |  1 +
 2 files changed, 73 insertions(+), 14 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0000913f7efc..d97aa2c209e3 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -36,6 +36,7 @@
 #include <linux/blk-mq.h>
 #include <linux/spinlock.h>
 #include <uapi/linux/loop.h>
+#include <linux/task_work.h>
 
 /* Possible states of device */
 enum {
@@ -74,6 +75,10 @@ struct loop_device {
 	struct gendisk		*lo_disk;
 	struct mutex		lo_mutex;
 	bool			idr_visible;
+	union {
+		struct callback_head lo_clr_task_work;
+		struct work_struct lo_clr_work;
+	};
 };
 
 struct loop_cmd {
@@ -1112,12 +1117,34 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
 	return error;
 }
 
-static void __loop_clr_fd(struct loop_device *lo)
+static void __loop_clr_fd(struct callback_head *callback)
 {
+	struct loop_device *lo = container_of(callback, struct loop_device, lo_clr_task_work);
 	struct queue_limits lim;
 	struct file *filp;
 	gfp_t gfp = lo->old_gfp_mask;
 
+	/* Step 1: Flush all outstanding I/O, without open_mutex held. */
+
+	/*
+	 * Now that loop_queue_rq() sees lo->lo_state != Lo_bound,
+	 * wait for already started loop_queue_rq() to complete.
+	 */
+	synchronize_rcu();
+	/*
+	 * Now that no more works are scheduled by loop_queue_rq(),
+	 * wait for already scheduled works to complete.
+	 */
+	drain_workqueue(lo->workqueue);
+	/*
+	 * Now that no more AIO requests are scheduled by lo_rw_aio(),
+	 * wait for already started AIO to complete.
+	 */
+	blk_mq_unfreeze_queue(lo->lo_queue, blk_mq_freeze_queue(lo->lo_queue));
+
+	/* Step 2: Perform remaining cleanup, with open_mutex held. */
+	mutex_lock(&lo->lo_disk->open_mutex);
+
 	spin_lock_irq(&lo->lo_lock);
 	filp = lo->lo_backing_file;
 	lo->lo_backing_file = NULL;
@@ -1128,12 +1155,7 @@ static void __loop_clr_fd(struct loop_device *lo)
 	lo->lo_sizelimit = 0;
 	memset(lo->lo_file_name, 0, LO_NAME_SIZE);
 
-	/*
-	 * Reset the block size to the default.
-	 *
-	 * No queue freezing needed because this is called from the final
-	 * ->release call only, so there can't be any outstanding I/O.
-	 */
+	/* Reset the block size to the default. */
 	lim = queue_limits_start_update(lo->lo_queue);
 	lim.logical_block_size = SECTOR_SIZE;
 	lim.physical_block_size = SECTOR_SIZE;
@@ -1145,8 +1167,6 @@ static void __loop_clr_fd(struct loop_device *lo)
 	/* let user-space know about this change */
 	kobject_uevent(&disk_to_dev(lo->lo_disk)->kobj, KOBJ_CHANGE);
 	mapping_set_gfp_mask(filp->f_mapping, gfp);
-	/* This is safe: open() is still holding a reference. */
-	module_put(THIS_MODULE);
 
 	disk_force_media_change(lo->lo_disk);
 
@@ -1154,9 +1174,6 @@ static void __loop_clr_fd(struct loop_device *lo)
 		int err;
 
 		/*
-		 * open_mutex has been held already in release path, so don't
-		 * acquire it if this function is called in such case.
-		 *
 		 * If the reread partition isn't from release path, lo_refcnt
 		 * must be at least one and it can only become zero when the
 		 * current holder is released.
@@ -1181,12 +1198,31 @@ static void __loop_clr_fd(struct loop_device *lo)
 	WRITE_ONCE(lo->lo_state, Lo_unbound);
 	mutex_unlock(&lo->lo_mutex);
 
+	/* Step 3: Drop refcounts, without open_mutex held. */
+	mutex_unlock(&lo->lo_disk->open_mutex);
+
 	/*
 	 * Need not hold lo_mutex to fput backing file. Calling fput holding
 	 * lo_mutex triggers a circular lock dependency possibility warning as
 	 * fput can take open_mutex which is usually taken before lo_mutex.
 	 */
 	fput(filp);
+
+	/*
+	 * Drop all references that would have been dropped as soon as
+	 * returning from lo_release() and releasing disk->open_mutex.
+	 */
+	module_put(lo->lo_disk->fops->owner);
+	put_device(disk_to_dev(lo->lo_disk));
+
+	module_put(THIS_MODULE);
+}
+
+static void loop_clr_work(struct work_struct *work)
+{
+	struct loop_device *lo = container_of(work, struct loop_device, lo_clr_work);
+
+	__loop_clr_fd(&lo->lo_clr_task_work);
 }
 
 static int loop_clr_fd(struct loop_device *lo)
@@ -1747,8 +1783,30 @@ static void lo_release(struct gendisk *disk)
 	need_clear = (lo->lo_state == Lo_rundown);
 	mutex_unlock(&lo->lo_mutex);
 
-	if (need_clear)
-		__loop_clr_fd(lo);
+	/*
+	 * In order to flush pending I/O requests before clearing the backing device,
+	 * defer __loop_clr_fd() to task work context or normal workqueue context.
+	 * The Lo_rundown state guarantees that lo_open() will fail with -ENXIO.
+	 */
+	if (need_clear) {
+		/*
+		 * Grab all references that will be dropped as soon as returning from
+		 * lo_release() and releasing disk->open_mutex.
+		 */
+		get_device(disk_to_dev(disk));
+		__module_get(disk->fops->owner);
+		/*
+		 * Prefer task work, for userspace might be expecting that fput()
+		 * on the backing file is processed before lo_release() from close()
+		 * returns to user mode.
+		 */
+		init_task_work(&lo->lo_clr_task_work, __loop_clr_fd);
+		if ((current->flags & PF_KTHREAD) ||
+		    task_work_add(current, &lo->lo_clr_task_work, TWA_RESUME)) {
+			INIT_WORK(&lo->lo_clr_work, loop_clr_work);
+			schedule_work(&lo->lo_clr_work);
+		}
+	}
 }
 
 static void lo_free_disk(struct gendisk *disk)
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 0f7519f8e7c9..45fd146b85df 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -102,6 +102,7 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(task_work_add);
 
 /**
  * task_work_cancel_match - cancel a pending work added by task_work_add()
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
  2026-05-25  3:40                           ` [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() Tetsuo Handa
@ 2026-05-25 15:19                             ` Ming Lei
  2026-05-26  0:25                               ` Tetsuo Handa
  0 siblings, 1 reply; 29+ messages in thread
From: Ming Lei @ 2026-05-25 15:19 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal,
	linux-block, LKML, Andrew Morton

On Mon, May 25, 2026 at 12:40:19PM +0900, Tetsuo Handa wrote:
> Some commit which was merged in the merge window for 7.1 broke the loop
> driver; a race window where lo_release() clears the backing file via
> __loop_clr_fd() despite some I/O requests are pending was introduced [1][2].
> 
> The exact commit which changed the behavior is not known due to lack of
> reproducer and timing dependent behavior, but it seems that we need to
> solve this problem in the loop driver despite there was no change for the
> loop driver during this merge window.
> 
> To close this race, try to flush pending I/O requests. However, calling
> drain_workqueue() from __loop_clr_fd() with disk->open_mutex held causes
> lockdep warnings [3][4]. We need to flush pending I/O requests without
> disk->open_mutex held.

No, please don't workaround before root cause.

No proof shows that the issue is in block layer or loop driver, the IO isn't
expected, you need to figure out why btrfs still issues IO after this loop
disk is closed by everyone and writeback is done.

https://syzkaller.appspot.com/x/log.txt?x=101e4702580000


Thanks,
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
  2026-05-25 15:19                             ` Ming Lei
@ 2026-05-26  0:25                               ` Tetsuo Handa
  2026-05-27  1:20                                 ` Ming Lei
  0 siblings, 1 reply; 29+ messages in thread
From: Tetsuo Handa @ 2026-05-26  0:25 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: Bart Van Assche, Christoph Hellwig, Damien Le Moal, linux-block,
	LKML, Andrew Morton, Linus Torvalds

On 2026/05/26 0:19, Ming Lei wrote:
> On Mon, May 25, 2026 at 12:40:19PM +0900, Tetsuo Handa wrote:
>> Some commit which was merged in the merge window for 7.1 broke the loop
>> driver; a race window where lo_release() clears the backing file via
>> __loop_clr_fd() despite some I/O requests are pending was introduced [1][2].
>>
>> The exact commit which changed the behavior is not known due to lack of
>> reproducer and timing dependent behavior, but it seems that we need to
>> solve this problem in the loop driver despite there was no change for the
>> loop driver during this merge window.
>>
>> To close this race, try to flush pending I/O requests. However, calling
>> drain_workqueue() from __loop_clr_fd() with disk->open_mutex held causes
>> lockdep warnings [3][4]. We need to flush pending I/O requests without
>> disk->open_mutex held.
> 
> No, please don't workaround before root cause.
> 
> No proof shows that the issue is in block layer or loop driver, the IO isn't
> expected, you need to figure out why btrfs still issues IO after this loop
> disk is closed by everyone and writeback is done.
> 
> https://syzkaller.appspot.com/x/log.txt?x=101e4702580000
> 

Of course we should try to figure out the root cause first, but how can we do?

  Absolute fact:

    This problem started happening no later than next-20260413 in the linux-next.git tree.
    ( syzbot was unable to test next-202604{03,06,07,08,09,10} due to a different bug. )

    This problem is still happening as of v7.1-rc5 in the linux.git tree.

    No one has succeeded establishing steps to reproduce this problem.

    No one has identified the exact commit that is causing this problem.

  Likely fact:

    Since this problem did not happen using next-20260402 in the linux-next.git tree until 2026/04/13 16:31,
    this problem did not exist until next-20260402 in the linux-next.git tree.

    Since this problem did not happen until v7.0, this problem did not exist until v7.0.
    (Although last minute changes for v7.0-rc{6,7} or v7.0 could become the culprit,
     the merge window which accepts big changes for v7.1 is more likely.)

  My guess:

    The culprit commit is in between commit a028739a4330 ("Merge tag 'block-7.0-20260305' of
    git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux") and commit 7fe6ac157b7e ("Merge tag
    'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux"), for
    changes related to bio handling are merged in this period.

    "git log --oneline block/ drivers/block/" between next-20260402 and next-20260413 shows the following diff:

--------------------
-da93b347876b Merge branch 'master' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
-9b75c6e054b7 Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git
-265720725a47 Merge branch 'fs-next' of linux-next
-ac9e99118030 Merge branch into tip/master: 'x86/cleanups'
-8ea5c0750d36 zram: do not forget to endio for partial discard requests
-0476d2e93477 zram: change scan_slots to return void
-1207420afea8 zram: propagate read_from_bdev_async() errors
-aafa569edb41 zram: optimize LZ4 dictionary compression performance
-24c76a259819 Merge branch 'for-7.1/block' into for-next
-eca714c0aac1 Merge branch 'vfs-7.1.bh.metadata' into vfs.all
+7d8d908556ca Merge branch 'master' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
+4391dc7df11d Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git
+18c6a4c24187 Merge branch 'fs-next' of linux-next
+7f828a86cfef Merge branch into tip/master: 'x86/cleanups'
+716aa108c5bb zram: reject unrecognized type= values in recompress_store()
+3470a1d34f40 zram: do not forget to endio for partial discard requests
+88a57e158619 Merge branch 'for-7.1/block' into for-next
+36446de0c30c ublk: fix tautological comparison warning in ublk_ctrl_reg_buf
+f2bab85781e8 Merge branch 'vfs-7.1.bh.metadata' into vfs.all
+9357dc97533a Merge branch 'vfs-7.1.integrity' into vfs.all
+e0b15707598c Merge branch 'for-7.1/block' into for-next
+539fb773a3f7 block: refactor blkdev_zone_mgmt_ioctl
+ddc1dfffcbea Merge branch 'for-7.1/block' into for-next
+365ea7cc6244 ublk: allow buffer registration before device is started
+5e864438e285 ublk: replace xarray with IDA for shmem buffer index allocation
+8ea8566a9aee ublk: simplify PFN range loop in __ublk_ctrl_reg_buf
+211ff1602b67 ublk: verify all pages in multi-page bvec fall within registered range
+23b3b6f0b584 ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
+cb793ff1353d Merge branch 'for-7.1/block' into for-next
+92c3737a2473 block: add a bio_submit_or_kill helper
+6fa747550e35 block: factor out a bio_await helper
+65565ca5f99b block: unify the synchronous bi_end_io callbacks
+cc91702dedc5 Merge branch 'for-7.1/block' into for-next
+8a34e88769f6 ublk: eliminate permanent pages[] array from struct ublk_buf
+08677040a911 ublk: enable UBLK_F_SHMEM_ZC feature flag
+4d4a512a1f87 ublk: add PFN-based buffer matching in I/O path
+2fb0ded237bb ublk: add UBLK_U_CMD_REG_BUF/UNREG_BUF control commands
+dec615fa43c3 Merge branch 'for-7.1/block' into for-next
+fa0cac9a5158 drbd: use get_random_u64() where appropriate
+0b581d2fb4cf Merge branch 'for-7.1/block' into for-next
+a9c4b1d37622 drbd: remove DRBD_GENLA_F_MANDATORY flag handling
+d436cfb3a259 Merge branch 'for-7.1/block' into for-next
+e9b004ff8306 blk-wbt: remove WARN_ON_ONCE from wbt_init_enable_default()
+09ebc43b5edc Merge branch 'for-7.1/block' into for-next
+0842186d2c4e ublk: reset per-IO canceled flag on each fetch
+cba82993308d zram: change scan_slots to return void
+bf989ade270d zram: propagate read_from_bdev_async() errors
+f0f6f7871430 zram: optimize LZ4 dictionary compression performance
+301f39220096 zram: unify and harden algo/priority params handling
+cedfa028b54e zram: remove chained recompression
+5004a27edba5 zram: drop ->num_active_comps
+ed19b9d5504f zram: do not autocorrect bad recompression parameters
+241f9005b1c8 zram: do not permit params change after init
+c09fb53d293a zram: use statically allocated compression algorithm names
+6030f93e5c71 Merge branch 'for-7.1/io_uring-fuse' into for-next
+29ebfdd7db89 io_uring/rsrc: rename io_buffer_register_bvec()/io_buffer_unregister_bvec()
+6568edbea553 Merge branch 'for-7.1/block' into for-next
+a175ee827331 block: use sysfs_emit in sysfs show functions
+c691e4b0d80b bio: fix kmemleak false positives from percpu bio alloc cache
 f91ffe89b201 blk-iocost: fix busy_level reset when no IOs complete
 23308af722fe blk-cgroup: fix disk reference leak in blkcg_maybe_throttle_current()
 b2a78fec344e zloop: add max_open_zones option
 2a2f520fda82 block: fix zones_cond memory leak on zone revalidation error paths
 267ec4d7223a loop: fix partition scan race between udev and loop_reread_partitions()
 499d2d2f4cf9 sed-opal: Add STACK_RESET command
-c61825bb46bc Merge branch 'vfs-7.1.integrity' into vfs.all
-fc2093641448 zram: unify and harden algo/priority params handling
-4fd453f16446 zram: remove chained recompression
-e2b717936d1a zram: drop ->num_active_comps
-3578bb37f7d1 zram: do not autocorrect bad recompression parameters
-5331373bfebd zram: do not permit params change after init
 2b31e86387e6 drbd: Balance RCU calls in drbd_adm_dump_devices()
 f9480ecf939d bdev: Drop pointless invalidate_inode_buffers() call
-b00ff1b25f85 zram: use statically allocated compression algorithm names
 630bbba45cfd drbd: use genl pre_doit/post_doit
 829def1e35ca zloop: forget write cache on force removal
 eff8d1656e83 zloop: refactor zloop_rw
--------------------

    "git log --oneline block/" between next-20260402 and next-20260413 shows the following diff:

--------------------
-9b75c6e054b7 Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git
-eca714c0aac1 Merge branch 'vfs-7.1.bh.metadata' into vfs.all
+4391dc7df11d Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git
+f2bab85781e8 Merge branch 'vfs-7.1.bh.metadata' into vfs.all
+539fb773a3f7 block: refactor blkdev_zone_mgmt_ioctl
+92c3737a2473 block: add a bio_submit_or_kill helper
+6fa747550e35 block: factor out a bio_await helper
+65565ca5f99b block: unify the synchronous bi_end_io callbacks
+e9b004ff8306 blk-wbt: remove WARN_ON_ONCE from wbt_init_enable_default()
+a175ee827331 block: use sysfs_emit in sysfs show functions
+c691e4b0d80b bio: fix kmemleak false positives from percpu bio alloc cache
 f91ffe89b201 blk-iocost: fix busy_level reset when no IOs complete
 23308af722fe blk-cgroup: fix disk reference leak in blkcg_maybe_throttle_current()
 2a2f520fda82 block: fix zones_cond memory leak on zone revalidation error paths
--------------------

Possible approaches for finding the exact commit that is causing this problem:

  (a) Revert all changes in the block layer from linux.git and monitor for one week for whether this
      problem is still happening (because linux.git is more frequently hitting this problem than
      linux-next.git ).

  (b) Revert all changes in the block layer from linux-next.git and monitor for two weeks for
      whether this problem is still happening (less reliable than linux.git but a candidate).

  (c) Let sashiko review all changes between v7.0 and v7.1 that may cause this problem.
      (Human developers have no time to review. But is investigation with moving baseline commit
      possible for sashiko ?)

  (d) Any ideas?

P.S. Since the loop driver is a critical infrastructure for testing filesystems by syzbot,
I want this problem be addressed before 7.1 is released.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
  2026-05-26  0:25                               ` Tetsuo Handa
@ 2026-05-27  1:20                                 ` Ming Lei
  2026-05-27  1:35                                   ` Tetsuo Handa
  0 siblings, 1 reply; 29+ messages in thread
From: Ming Lei @ 2026-05-27  1:20 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal,
	linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs,
	David Sterba

On Tue, May 26, 2026 at 09:25:30AM +0900, Tetsuo Handa wrote:
> On 2026/05/26 0:19, Ming Lei wrote:
> > On Mon, May 25, 2026 at 12:40:19PM +0900, Tetsuo Handa wrote:
> >> Some commit which was merged in the merge window for 7.1 broke the loop
> >> driver; a race window where lo_release() clears the backing file via
> >> __loop_clr_fd() despite some I/O requests are pending was introduced [1][2].
> >>
> >> The exact commit which changed the behavior is not known due to lack of
> >> reproducer and timing dependent behavior, but it seems that we need to
> >> solve this problem in the loop driver despite there was no change for the
> >> loop driver during this merge window.
> >>
> >> To close this race, try to flush pending I/O requests. However, calling
> >> drain_workqueue() from __loop_clr_fd() with disk->open_mutex held causes
> >> lockdep warnings [3][4]. We need to flush pending I/O requests without
> >> disk->open_mutex held.
> > 
> > No, please don't workaround before root cause.
> > 
> > No proof shows that the issue is in block layer or loop driver, the IO isn't
> > expected, you need to figure out why btrfs still issues IO after this loop
> > disk is closed by everyone and writeback is done.
> > 
> > https://syzkaller.appspot.com/x/log.txt?x=101e4702580000
> > 
> 
> Of course we should try to figure out the root cause first, but how can we do?

Definitely unexpected write IO(after umount & loop closed) from btrfs is more serious,
which may cause data loss, so CC btrfs list and maintainer.

...
 
> Possible approaches for finding the exact commit that is causing this problem:
> 
>   (a) Revert all changes in the block layer from linux.git and monitor for one week for whether this
>       problem is still happening (because linux.git is more frequently hitting this problem than
>       linux-next.git ).
> 
>   (b) Revert all changes in the block layer from linux-next.git and monitor for two weeks for
>       whether this problem is still happening (less reliable than linux.git but a candidate).
> 
>   (c) Let sashiko review all changes between v7.0 and v7.1 that may cause this problem.
>       (Human developers have no time to review. But is investigation with moving baseline commit
>       possible for sashiko ?)
> 
>   (d) Any ideas?
> 
> P.S. Since the loop driver is a critical infrastructure for testing filesystems by syzbot,
> I want this problem be addressed before 7.1 is released.

syzbot is for finding real problem, here the real trouble is unexpected write IO from btrfs.

So please do not try to paper over real bug by 'fixing' loop.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
  2026-05-27  1:20                                 ` Ming Lei
@ 2026-05-27  1:35                                   ` Tetsuo Handa
  2026-05-27  3:00                                     ` Ming Lei
  0 siblings, 1 reply; 29+ messages in thread
From: Tetsuo Handa @ 2026-05-27  1:35 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal,
	linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs,
	David Sterba

On 2026/05/27 10:20, Ming Lei wrote:
>> Of course we should try to figure out the root cause first, but how can we do?
> 
> Definitely unexpected write IO(after umount & loop closed) from btrfs is more serious,
> which may cause data loss, so CC btrfs list and maintainer.

Why do you assume that the culprit is btrfs?

https://syzkaller.appspot.com/bug?extid=bc273027d5643e48e5b3 indicated that
this similar race is also happening with jfs.

[  678.816570][ T1038] read_mapping_page failed!
[  678.816584][ T1038] ERROR: (device loop3): txCommit: 
[  678.816584][ T1038] 
[  678.816633][ T1038] jfs_write_inode: jfs_commit_inode failed!
[  678.895688][ T2183] lo_rw_aio(loop3) starting write with raw_refcnt=0x0, refcnt=1
[  678.956225][ T2183] lo_rw_aio(loop3) starting write with raw_refcnt=0x0, refcnt=1
[  678.970652][   T12] lo_rw_aio(loop3) starting write with raw_refcnt=0x0, refcnt=1
[  679.102838][ T4281] lo_rw_aio(loop3) starting write with raw_refcnt=0x0, refcnt=1
[  679.104701][ T4281] lo_rw_aio(loop3) starting write with raw_refcnt=0x0, refcnt=1
[  679.121329][ T2183] lo_rw_aio(loop3) starting write with raw_refcnt=0x0, refcnt=1
[  679.122119][ T2183] lo_rw_aio(loop3) starting write with raw_refcnt=0x0, refcnt=1
[  679.199283][ T2183] lo_rw_aio(loop3) starting read with raw_refcnt=0x0, refcnt=1
[  679.200014][ T2183] lo_rw_aio(loop3) starting write with raw_refcnt=0x0, refcnt=1
[  679.275613][ T5615] __loop_clr_fd(loop3) clearing lo_backing_file with raw_refcnt=0x0, refcnt=1
[  679.397358][   T13] bridge_slave_1: left allmulticast mode
[  679.397399][   T13] bridge_slave_1: left promiscuous mode
[  679.410004][   T13] bridge0: port 2(bridge_slave_1) entered disabled state
[  679.433576][ T2183] ------------[ cut here ]------------
[  679.433592][ T2183] d_inode(dentry) != file_inode(file)
[  679.433617][ T2183] WARNING: ./include/linux/fs.h:1368 at file_remove_privs_flags+0x58c/0x640, CPU#0: kworker/u8:12/2183
[  679.433676][ T2183] Modules linked in:
[  679.433695][ T2183] CPU: 0 UID: 0 PID: 2183 Comm: kworker/u8:12 Not tainted syzkaller #0 PREEMPT_{RT,(full)} 
[  679.433720][ T2183] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
[  679.433739][ T2183] Workqueue: loop3 loop_workfn
[  679.433805][ T2183] RIP: 0010:file_remove_privs_flags+0x58c/0x640
[  679.433848][ T2183] Code: 00 75 4d 44 89 e8 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc e8 5f d4 80 ff e9 90 fe ff ff e8 55 d4 80 ff 90 <0f> 0b 90 e9 85 fb ff ff 44 89 f1 80 e1 07 80 c1 03 38 c1 0f 8c b7
[  679.433867][ T2183] RSP: 0018:ffffc90007e374e0 EFLAGS: 00010293
[  679.433885][ T2183] RAX: ffffffff8243f7cb RBX: ffff888036fa8ca0 RCX: ffff88802c0abd80
[  679.433902][ T2183] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  679.433933][ T2183] RBP: ffffc90007e37638 R08: 0000000000000000 R09: 0000000000000000
[  679.433946][ T2183] R10: dffffc0000000000 R11: fffffbfff1f1597f R12: ffff888063726220
[  679.433962][ T2183] R13: 1ffff11006df5194 R14: 0000000000000000 R15: 1ffff1100c6e4c44
[  679.433978][ T2183] FS:  0000000000000000(0000) GS:ffff888125f1f000(0000) knlGS:0000000000000000
[  679.433998][ T2183] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  679.434016][ T2183] CR2: 00007f22e1be7dac CR3: 000000003e332000 CR4: 00000000003526f0
[  679.434038][ T2183] Call Trace:
[  679.434049][ T2183]  <TASK>
[  679.434072][ T2183]  ? __pfx_file_remove_privs_flags+0x10/0x10
[  679.434118][ T2183]  ? rt_mutex_post_schedule+0xd1/0x1c0
[  679.434172][ T2183]  ? generic_write_checks_count+0x449/0x550
[  679.434212][ T2183]  ? generic_write_checks+0xc8/0x110
[  679.434249][ T2183]  shmem_file_write_iter+0xaa/0x120
[  679.434286][ T2183]  lo_rw_aio+0xef0/0x1170
[  679.434349][ T2183]  ? __pfx_lo_rw_aio+0x10/0x10
[  679.434401][ T2183]  ? kthread_associate_blkcg+0x490/0x600
[  679.434432][ T2183]  ? rt_spin_unlock+0x160/0x200
[  679.434476][ T2183]  loop_process_work+0x637/0x11b0
[  679.434539][ T2183]  ? __pfx_loop_process_work+0x10/0x10
[  679.434582][ T2183]  ? look_up_lock_class+0x57/0x110
[  679.434626][ T2183]  ? register_lock_class+0x31/0x2e0
[  679.434661][ T2183]  ? __lock_acquire+0x6b5/0x2d10
[  679.434741][ T2183]  ? do_raw_spin_lock+0x12b/0x2f0
[  679.434785][ T2183]  ? __pfx_do_raw_spin_lock+0x10/0x10
[  679.434830][ T2183]  ? process_one_work+0x8be/0x1630
[  679.434870][ T2183]  ? process_one_work+0x8be/0x1630
[  679.434922][ T2183]  ? process_one_work+0x8be/0x1630
[  679.434959][ T2183]  process_one_work+0x98b/0x1630
[  679.435026][ T2183]  ? __pfx_process_one_work+0x10/0x10
[  679.435060][ T2183]  ? do_raw_spin_lock+0x12b/0x2f0
[  679.435128][ T2183]  worker_thread+0xb49/0x1140
[  679.435202][ T2183]  kthread+0x388/0x470
[  679.435233][ T2183]  ? __pfx_worker_thread+0x10/0x10
[  679.435276][ T2183]  ? __pfx_kthread+0x10/0x10
[  679.435309][ T2183]  ret_from_fork+0x514/0xb70
[  679.435348][ T2183]  ? __pfx_ret_from_fork+0x10/0x10
[  679.435382][ T2183]  ? __switch_to+0xc79/0x1410
[  679.435415][ T2183]  ? __pfx_kthread+0x10/0x10
[  679.435447][ T2183]  ret_from_fork_asm+0x1a/0x30
[  679.435517][ T2183]  </TASK>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
  2026-05-27  1:35                                   ` Tetsuo Handa
@ 2026-05-27  3:00                                     ` Ming Lei
  2026-05-27 11:29                                       ` Tetsuo Handa
  2026-05-28  5:43                                       ` Hillf Danton
  0 siblings, 2 replies; 29+ messages in thread
From: Ming Lei @ 2026-05-27  3:00 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal,
	linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs,
	David Sterba, linux-fsdevel, Christian Brauner

On Wed, May 27, 2026 at 10:35:56AM +0900, Tetsuo Handa wrote:
> On 2026/05/27 10:20, Ming Lei wrote:
> >> Of course we should try to figure out the root cause first, but how can we do?
> > 
> > Definitely unexpected write IO(after umount & loop closed) from btrfs is more serious,
> > which may cause data loss, so CC btrfs list and maintainer.
> 
> Why do you assume that the culprit is btrfs?
> 
> https://syzkaller.appspot.com/bug?extid=bc273027d5643e48e5b3 indicated that
> this similar race is also happening with jfs.

I just didn't see the above report on jfs.

It doesn't change anything, the same question still stands: unexpected write IO is issued
or crosses umount & last closing of loop disk.



Thanks,
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
  2026-05-27  3:00                                     ` Ming Lei
@ 2026-05-27 11:29                                       ` Tetsuo Handa
  2026-05-27 18:11                                         ` Damien Le Moal
  2026-05-28  5:43                                       ` Hillf Danton
  1 sibling, 1 reply; 29+ messages in thread
From: Tetsuo Handa @ 2026-05-27 11:29 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal,
	linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs,
	David Sterba, linux-fsdevel, Christian Brauner

On 2026/05/27 12:00, Ming Lei wrote:
> On Wed, May 27, 2026 at 10:35:56AM +0900, Tetsuo Handa wrote:
>> On 2026/05/27 10:20, Ming Lei wrote:
>>>> Of course we should try to figure out the root cause first, but how can we do?
>>>
>>> Definitely unexpected write IO(after umount & loop closed) from btrfs is more serious,
>>> which may cause data loss, so CC btrfs list and maintainer.
>>

I had a conversation with Google AI mode, and received the following response.

--------------------------------------------------------------------------------
Technical Analysis: lo_rw_aio Null Pointer Dereference / UAF since v7.1-rc1


1. The Root Cause of the Timing Shift

This regression was introduced during the v7.1-rc1 merge window, primarily exposed by
Commit 65565ca5f99b ("block: unify the synchronous bi_end_io callbacks"), along with
helper refactorings like Commit 92c3737a2473 ("block: add a bio_submit_or_kill helper").

Prior to v7.0, the synchronous I/O completion path inherently contained execution lags (due
to serialized completion handling and context switches) before notifying upper layers. This
latency accidentally acted as a natural safety barrier. It ensured that by the time a file
system completed its final sync_filesystem() and initiated umount, the loop driver's internal
workqueue (lo_rw_aio) had already finished processing everything.

In v7.1, the unification and optimization of bi_end_io significantly minimized this latency.
The filesystem now learns of "I/O completion" much faster. Consequently, highly-concurrent
execution pipelines like btrfs or jfs proceed rapidly through kill_sb() and blkdev_put(),
ultimately invoking lo_release() -> __loop_clr_fd() while the loop driver's backend kworker
is still in the middle of executing the last sub-millisecond asynchronous file-backed I/O
request.


2. Why the Block Layer's Built-in Quiesce/Freeze Fails

There is an implicit assumption that standard block layer freeze mechanisms (blk_mq_freeze_queue())
protect the device lifetime during release. However, the v7.1 BIO helper refactoring introduced
a synchronization gap:

  1. The filesystem triggers its final metadata or journal updates (e.g., txCommit in jfs or
     delayed refcount updates in btrfs) right during the unmount/close boundary.
  2. Due to the optimized execution path, these requests bypass the block layer's active
     request-tracking metrics at the exact moment blk_mq_freeze_queue() or state validation
     checks evaluated them as zero.
  3. The block layer assumes the queue is safe and silent, allowing __loop_clr_fd() to
     progress and nullify lo->lo_backing_file (or trigger fput()).
  4. The leaked asynchronous kworker wakes up a fraction of a millisecond too late, attempts
     to access lo->lo_backing_file or invokes kiocb_end_write() -> file_inode(), leading to
     either a general protection fault (Null pointer dereference) or a Use-After-Free (UAF).


3. Why This Isn't Just an "Unexpected FS Bug"

While the write I/O originates from file systems like btrfs and jfs post-close, blaming the
file systems entirely ignores the underlying infrastructure change. The core issue is that the
block layer altered its synchronization behavior, breaking the barrier contract that
VFS and file systems historically relied on during the device release path.

Papering over this inside individual file systems would require adding heavy, duplicated
barriers inside every single filesystem's unmount path.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
  2026-05-27 11:29                                       ` Tetsuo Handa
@ 2026-05-27 18:11                                         ` Damien Le Moal
  2026-05-28  8:38                                           ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Damien Le Moal @ 2026-05-27 18:11 UTC (permalink / raw)
  To: Tetsuo Handa, Ming Lei
  Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, linux-block, LKML,
	Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba,
	linux-fsdevel, Christian Brauner

On 2026/05/27 20:29, Tetsuo Handa wrote:
> On 2026/05/27 12:00, Ming Lei wrote:
>> On Wed, May 27, 2026 at 10:35:56AM +0900, Tetsuo Handa wrote:
>>> On 2026/05/27 10:20, Ming Lei wrote:
>>>>> Of course we should try to figure out the root cause first, but how can we do?
>>>>
>>>> Definitely unexpected write IO(after umount & loop closed) from btrfs is more serious,
>>>> which may cause data loss, so CC btrfs list and maintainer.
>>>
> 
> I had a conversation with Google AI mode, and received the following response.
> 
> --------------------------------------------------------------------------------
> Technical Analysis: lo_rw_aio Null Pointer Dereference / UAF since v7.1-rc1
> 
> 
> 1. The Root Cause of the Timing Shift
> 
> This regression was introduced during the v7.1-rc1 merge window, primarily exposed by
> Commit 65565ca5f99b ("block: unify the synchronous bi_end_io callbacks"), along with
> helper refactorings like Commit 92c3737a2473 ("block: add a bio_submit_or_kill helper").
> 
> Prior to v7.0, the synchronous I/O completion path inherently contained execution lags (due
> to serialized completion handling and context switches) before notifying upper layers. This
> latency accidentally acted as a natural safety barrier. It ensured that by the time a file
> system completed its final sync_filesystem() and initiated umount, the loop driver's internal
> workqueue (lo_rw_aio) had already finished processing everything.
> 
> In v7.1, the unification and optimization of bi_end_io significantly minimized this latency.
> The filesystem now learns of "I/O completion" much faster. Consequently, highly-concurrent
> execution pipelines like btrfs or jfs proceed rapidly through kill_sb() and blkdev_put(),
> ultimately invoking lo_release() -> __loop_clr_fd() while the loop driver's backend kworker
> is still in the middle of executing the last sub-millisecond asynchronous file-backed I/O
> request.
> 
> 
> 2. Why the Block Layer's Built-in Quiesce/Freeze Fails
> 
> There is an implicit assumption that standard block layer freeze mechanisms (blk_mq_freeze_queue())
> protect the device lifetime during release. However, the v7.1 BIO helper refactoring introduced
> a synchronization gap:
> 
>   1. The filesystem triggers its final metadata or journal updates (e.g., txCommit in jfs or
>      delayed refcount updates in btrfs) right during the unmount/close boundary.
>   2. Due to the optimized execution path, these requests bypass the block layer's active
>      request-tracking metrics at the exact moment blk_mq_freeze_queue() or state validation
>      checks evaluated them as zero.
>   3. The block layer assumes the queue is safe and silent, allowing __loop_clr_fd() to
>      progress and nullify lo->lo_backing_file (or trigger fput()).
>   4. The leaked asynchronous kworker wakes up a fraction of a millisecond too late, attempts
>      to access lo->lo_backing_file or invokes kiocb_end_write() -> file_inode(), leading to
>      either a general protection fault (Null pointer dereference) or a Use-After-Free (UAF).
> 
> 
> 3. Why This Isn't Just an "Unexpected FS Bug"
> 
> While the write I/O originates from file systems like btrfs and jfs post-close, blaming the
> file systems entirely ignores the underlying infrastructure change. The core issue is that the
> block layer altered its synchronization behavior, breaking the barrier contract that
> VFS and file systems historically relied on during the device release path.
> 
> Papering over this inside individual file systems would require adding heavy, duplicated
> barriers inside every single filesystem's unmount path.

It sounds like the VFS unmount call needs to have something that waits for
sync() to complete. Though, it really feels very strange that an FS can complete
unmount without itself ensuring that there are no more IOs in flight. The
generic VFS layer cannot know what the FS needs to flush on unmount, so waiting
on a generic sync might not be enough.

It really feels like this is a btrfs and jfs issue, unless the same can be
reproduced with any file system (XFS, ext4, f2fs, ...).

Just my 2 cents.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
  2026-05-27  3:00                                     ` Ming Lei
  2026-05-27 11:29                                       ` Tetsuo Handa
@ 2026-05-28  5:43                                       ` Hillf Danton
  2026-05-28 23:00                                         ` Hillf Danton
  1 sibling, 1 reply; 29+ messages in thread
From: Hillf Danton @ 2026-05-28  5:43 UTC (permalink / raw)
  To: Ming Lei
  Cc: Tetsuo Handa, Jens Axboe, Bart Van Assche, Christoph Hellwig,
	Damien Le Moal, linux-block, LKML, Andrew Morton, Linus Torvalds,
	linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner

On Tue, 26 May 2026 22:00:49 -0500 Ming Lei wrote:
>On Wed, May 27, 2026 at 10:35:56AM +0900, Tetsuo Handa wrote:
>> On 2026/05/27 10:20, Ming Lei wrote:
>> >> Of course we should try to figure out the root cause first, but how can we do?
>> > 
>> > Definitely unexpected write IO(after umount & loop closed) from btrfs is more serious,
>> > which may cause data loss, so CC btrfs list and maintainer.
>> 
>> Why do you assume that the culprit is btrfs?
>> 
>> https://syzkaller.appspot.com/bug?extid=bc273027d5643e48e5b3 indicated that
>> this similar race is also happening with jfs.
>
> I just didn't see the above report on jfs.
> 
> It doesn't change anything, the same question still stands: unexpected write IO is issued
> or crosses umount & last closing of loop disk.
>
Given the loop workqueue that triggered the jfs warning, can you specify
the reason why the workqueue in question is NOT flushed while closing disk?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
  2026-05-27 18:11                                         ` Damien Le Moal
@ 2026-05-28  8:38                                           ` Christoph Hellwig
  2026-05-28 10:16                                             ` Qu Wenruo
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2026-05-28  8:38 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Tetsuo Handa, Ming Lei, Jens Axboe, Bart Van Assche,
	Christoph Hellwig, linux-block, LKML, Andrew Morton,
	Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel,
	Christian Brauner

On Thu, May 28, 2026 at 03:11:05AM +0900, Damien Le Moal wrote:
> It sounds like the VFS unmount call needs to have something that waits for
> sync() to complete. Though, it really feels very strange that an FS can complete

I don't think this is the VFS-controlled VFS file data writeback, which
we wait on, but some kind of fs controlled metadata.  And yes, it looks
like those file systems are buggy in that area.  We definitively had
such bugs in XFS before and fixed them.

e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against
unmount")



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
  2026-05-28  8:38                                           ` Christoph Hellwig
@ 2026-05-28 10:16                                             ` Qu Wenruo
  0 siblings, 0 replies; 29+ messages in thread
From: Qu Wenruo @ 2026-05-28 10:16 UTC (permalink / raw)
  To: Christoph Hellwig, Damien Le Moal
  Cc: Tetsuo Handa, Ming Lei, Jens Axboe, Bart Van Assche, linux-block,
	LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba,
	linux-fsdevel, Christian Brauner



在 2026/5/28 18:08, Christoph Hellwig 写道:
> On Thu, May 28, 2026 at 03:11:05AM +0900, Damien Le Moal wrote:
>> It sounds like the VFS unmount call needs to have something that waits for
>> sync() to complete. Though, it really feels very strange that an FS can complete
> 
> I don't think this is the VFS-controlled VFS file data writeback, which
> we wait on, but some kind of fs controlled metadata.  And yes, it looks
> like those file systems are buggy in that area.  We definitively had
> such bugs in XFS before and fixed them.
> 
> e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against
> unmount")
Considering the xfs fix is pretty old, it's before the fix hint thus no 
such mention in fstests.

Do you happen to know which test case is for that fix?
I'd like to adapt it for btrfs as a reproducer.

This syzbot report doesn't provide a reproducer.


Another thing is, if it's some btrfs bios on-the-fly after 
close_ctree(), the most common symptom should be NULL pointer 
dereference inside various btrfs endio functions.
As all those end_bbio_*() functions are referring to either fs_info or 
inode/eb, thus if the fs is unmounted before the bio finished, they 
should all cause use-after-free.

The only exception is discard, which is using blkdev_issue_discard() 
thus has no such reference to btrfs internal structure, but that's out 
of my understanding.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
  2026-05-28  5:43                                       ` Hillf Danton
@ 2026-05-28 23:00                                         ` Hillf Danton
  2026-05-29  0:14                                           ` Tetsuo Handa
  0 siblings, 1 reply; 29+ messages in thread
From: Hillf Danton @ 2026-05-28 23:00 UTC (permalink / raw)
  To: Ming Lei
  Cc: Tetsuo Handa, Jens Axboe, Bart Van Assche, Christoph Hellwig,
	Damien Le Moal, linux-block, LKML, Andrew Morton, Linus Torvalds,
	linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner

On Thu, 28 May 2026 13:43:31 +0800 Hillf Danton wrote:
>On Tue, 26 May 2026 22:00:49 -0500 Ming Lei wrote:
>>On Wed, May 27, 2026 at 10:35:56AM +0900, Tetsuo Handa wrote:
>>> On 2026/05/27 10:20, Ming Lei wrote:
>>> >> Of course we should try to figure out the root cause first, but how can we do?
>>> > 
>>> > Definitely unexpected write IO(after umount & loop closed) from btrfs is more serious,
>>> > which may cause data loss, so CC btrfs list and maintainer.
>>> 
>>> Why do you assume that the culprit is btrfs?
>>> 
>>> https://syzkaller.appspot.com/bug?extid=bc273027d5643e48e5b3 indicated that
>>> this similar race is also happening with jfs.
>>
>> I just didn't see the above report on jfs.
>> 
>> It doesn't change anything, the same question still stands: unexpected write IO is issued
>> or crosses umount & last closing of loop disk.
>>
> Given the loop workqueue that triggered the jfs warning, can you specify
> the reason why the workqueue in question is NOT flushed while closing disk?
>
Got it, the loop workqueue is NOT flushed to avoid deadlock, see d292dc80686a
("loop: don't destroy lo->workqueue in __loop_clr_fd") for detail.
And the deadlock can be reproduced by flushing the loop workqueue with
disk->open_mutex held [1].

[1] Subject: Re: [syzbot] possible deadlock in blkdev_put (3)
https://lore.kernel.org/lkml/000000000000ea753505da2658d5@google.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
  2026-05-28 23:00                                         ` Hillf Danton
@ 2026-05-29  0:14                                           ` Tetsuo Handa
  2026-05-29  7:04                                             ` Hillf Danton
  0 siblings, 1 reply; 29+ messages in thread
From: Tetsuo Handa @ 2026-05-29  0:14 UTC (permalink / raw)
  To: Hillf Danton, Ming Lei
  Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal,
	linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs,
	David Sterba, linux-fsdevel, Christian Brauner

On 2026/05/29 8:00, Hillf Danton wrote:
>> Given the loop workqueue that triggered the jfs warning, can you specify
>> the reason why the workqueue in question is NOT flushed while closing disk?
>>
> Got it, the loop workqueue is NOT flushed to avoid deadlock, see d292dc80686a
> ("loop: don't destroy lo->workqueue in __loop_clr_fd") for detail.
> And the deadlock can be reproduced by flushing the loop workqueue with
> disk->open_mutex held [1].
> 
> [1] Subject: Re: [syzbot] possible deadlock in blkdev_put (3)
> https://lore.kernel.org/lkml/000000000000ea753505da2658d5@google.com/

We can avoid the following lockdep warnings (including [1] you mentioned)

  https://syzkaller.appspot.com/bug?extid=2f62807dc3239b8f584e
  https://syzkaller.appspot.com/bug?extid=c4e9d077bcc86bee08dc
  https://syzkaller.appspot.com/bug?extid=0f427123ae84b3ba6dc7
  https://syzkaller.appspot.com/bug?extid=4feabfc9641267769c97
  https://syzkaller.appspot.com/bug?extid=fb0ff9bfe34ad282ebd4

caused by "drain_workqueue() with disk->open_mutex held" if we assign
caller-specific lockdep class to disk->open_mutex

  https://sourceforge.net/p/tomoyo/tomoyo.git/ci/c2245c765ebeba9dcb924d9171d8d470a9ac41c8/

.

Also, we can avoid lockdep warning caused by "drain_workqueue() with disk->open_mutex held" +
"holding system_transition_mutex" if we forbid binding to pseudo files as backing file
in the loop driver

  https://lkml.kernel.org/r/d38e4600-3c32-491f-aa49-905f4fad1bfb@I-love.SAKURA.ne.jp

which we can reproduce with

  echo 7:0 > /sys/power/resume
  losetup /dev/loop0 /sys/power/resume
  cat /dev/loop0 > /dev/null
  losetup -d /dev/loop0

.

Therefore, I think we can address this problem by "drain_workqueue() with disk->open_mutex
held" in the loop driver side.



However, the possibility that the last milli-second writeback request
(which runs during unmount operation) from filesystem fails due to

    if (data_race(READ_ONCE(lo->lo_state)) != Lo_bound)
        return BLK_STS_IOERR;

check in loop_queue_rq() will remain. Therefore, addressing this problem
within individual filesystem will be more strict solution. But guessing from
the pace jfs fixes bugs, it would take long time before we stop seeing
this problem...


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
  2026-05-29  0:14                                           ` Tetsuo Handa
@ 2026-05-29  7:04                                             ` Hillf Danton
  2026-05-29 22:05                                               ` Hillf Danton
  0 siblings, 1 reply; 29+ messages in thread
From: Hillf Danton @ 2026-05-29  7:04 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal,
	Ming Lei, linux-block, LKML, Andrew Morton, Linus Torvalds,
	linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner

On Fri, 29 May 2026 09:14:47 +0900 Tetsuo Handa wrote:
>On 2026/05/29 8:00, Hillf Danton wrote:
>>> Given the loop workqueue that triggered the jfs warning, can you specify
>>> the reason why the workqueue in question is NOT flushed while closing disk?
>>>
>> Got it, the loop workqueue is NOT flushed to avoid deadlock, see d292dc80686a
>> ("loop: don't destroy lo->workqueue in __loop_clr_fd") for detail.
>> And the deadlock can be reproduced by flushing the loop workqueue with
>> disk->open_mutex held [1].
>> 
>> [1] Subject: Re: [syzbot] possible deadlock in blkdev_put (3)
>> https://lore.kernel.org/lkml/000000000000ea753505da2658d5@google.com/
>
>We can avoid the following lockdep warnings (including [1] you mentioned)
>
>  https://syzkaller.appspot.com/bug?extid=2f62807dc3239b8f584e
>  https://syzkaller.appspot.com/bug?extid=c4e9d077bcc86bee08dc
>  https://syzkaller.appspot.com/bug?extid=0f427123ae84b3ba6dc7
>  https://syzkaller.appspot.com/bug?extid=4feabfc9641267769c97
>  https://syzkaller.appspot.com/bug?extid=fb0ff9bfe34ad282ebd4
>
>caused by "drain_workqueue() with disk->open_mutex held" if we assign
>caller-specific lockdep class to disk->open_mutex
>
>  https://sourceforge.net/p/tomoyo/tomoyo.git/ci/c2245c765ebeba9dcb924d9171d8d470a9ac41c8/
>
>.
>
>Also, we can avoid lockdep warning caused by "drain_workqueue() with disk->open_mutex held" +
>"holding system_transition_mutex" if we forbid binding to pseudo files as backing file
>in the loop driver
>
>  https://lkml.kernel.org/r/d38e4600-3c32-491f-aa49-905f4fad1bfb@I-love.SAKURA.ne.jp
>
>which we can reproduce with
>
>  echo 7:0 > /sys/power/resume
>  losetup /dev/loop0 /sys/power/resume
>  cat /dev/loop0 > /dev/null
>  losetup -d /dev/loop0
>
>.
>
>Therefore, I think we can address this problem by "drain_workqueue() with disk->open_mutex
>held" in the loop driver side.
>
Good news.
>
>
>However, the possibility that the last milli-second writeback request
>(which runs during unmount operation) from filesystem fails due to
>
>    if (data_race(READ_ONCE(lo->lo_state)) != Lo_bound)
>        return BLK_STS_IOERR;
>
>check in loop_queue_rq() will remain.

This conflicts with "There is no need to destroy the workqueue when
clearing unbinding a loop device from a backing file." in d292dc80686a

>Therefore, addressing this problem
>within individual filesystem will be more strict solution. But guessing from

Conflicts with "Another thing is, if it's some btrfs bios on-the-fly after 
close_ctree(), the most common symptom should be NULL pointer 
dereference inside various btrfs endio functions." [2] once more.

And you need to pay the fs guys more than two cents I think for cooking
a FIX.

[2] Subject: Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
https://lore.kernel.org/lkml/36571f8a-4df8-4152-b078-d82dbff4ad7e@suse.com/

>the pace jfs fixes bugs, it would take long time before we stop seeing
>this problem...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
  2026-05-29  7:04                                             ` Hillf Danton
@ 2026-05-29 22:05                                               ` Hillf Danton
  0 siblings, 0 replies; 29+ messages in thread
From: Hillf Danton @ 2026-05-29 22:05 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal,
	Ming Lei, linux-block, LKML, Andrew Morton, Linus Torvalds,
	linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner,
	syzbot+78ad2c6a58c0a1faa5f5

On Fri, 29 May 2026 15:04:10 +0800 Hillf Danton wrote:
>On Fri, 29 May 2026 09:14:47 +0900 Tetsuo Handa wrote:
>>On 2026/05/29 8:00, Hillf Danton wrote:
>>>> Given the loop workqueue that triggered the jfs warning, can you specify
>>>> the reason why the workqueue in question is NOT flushed while closing disk?
>>>>
>>> Got it, the loop workqueue is NOT flushed to avoid deadlock, see d292dc80686a
>>> ("loop: don't destroy lo->workqueue in __loop_clr_fd") for detail.
>>> And the deadlock can be reproduced by flushing the loop workqueue with
>>> disk->open_mutex held [1].
>>> 
>>> [1] Subject: Re: [syzbot] possible deadlock in blkdev_put (3)
>>> https://lore.kernel.org/lkml/000000000000ea753505da2658d5@google.com/
>>
>>We can avoid the following lockdep warnings (including [1] you mentioned)
>>
>>  https://syzkaller.appspot.com/bug?extid=2f62807dc3239b8f584e
>>  https://syzkaller.appspot.com/bug?extid=c4e9d077bcc86bee08dc
>>  https://syzkaller.appspot.com/bug?extid=0f427123ae84b3ba6dc7
>>  https://syzkaller.appspot.com/bug?extid=4feabfc9641267769c97
>>  https://syzkaller.appspot.com/bug?extid=fb0ff9bfe34ad282ebd4
>>
>>caused by "drain_workqueue() with disk->open_mutex held" if we assign
>>caller-specific lockdep class to disk->open_mutex
>>
>>  https://sourceforge.net/p/tomoyo/tomoyo.git/ci/c2245c765ebeba9dcb924d9171d8d470a9ac41c8/
>>
>>.
>>
>>Also, we can avoid lockdep warning caused by "drain_workqueue() with disk->open_mutex held" +
>>"holding system_transition_mutex" if we forbid binding to pseudo files as backing file
>>in the loop driver
>>
>>  https://lkml.kernel.org/r/d38e4600-3c32-491f-aa49-905f4fad1bfb@I-love.SAKURA.ne.jp
>>
>>which we can reproduce with
>>
>>  echo 7:0 > /sys/power/resume
>>  losetup /dev/loop0 /sys/power/resume
>>  cat /dev/loop0 > /dev/null
>>  losetup -d /dev/loop0
>>
>>.
>>
>> Therefore, I think we can address this problem by "drain_workqueue() with disk->open_mutex
>> held" in the loop driver side.
>>
> Good news.
>
Bad news: Subject: [syzbot] [block?] possible deadlock in loop_process_work
[3] https://lore.kernel.org/lkml/6a19f5f7.5099cdd9.8e407.0004.GAE@google.com/

syzbot found the following issue on:

HEAD commit:    c1ecb239fa34 Add linux-next specific files for 20260522
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=12fa6336580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=77a9211ff284de54
dashboard link: https://syzkaller.appspot.com/bug?extid=78ad2c6a58c0a1faa5f5
compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/4cb88c910144/disk-c1ecb239.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/4a9bc938cf88/vmlinux-c1ecb239.xz
kernel image: https://storage.googleapis.com/syzbot-assets/684f1e33f264/bzImage-c1ecb239.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+78ad2c6a58c0a1faa5f5@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Tainted: G             L
------------------------------------------------------
kworker/u8:15/1491 is trying to acquire lock:
ffff88805e1a6480 (sb_writers#5){.+.+}-{0:0}, at: do_req_filebacked drivers/block/loop.c:433 [inline]
ffff88805e1a6480 (sb_writers#5){.+.+}-{0:0}, at: loop_handle_cmd drivers/block/loop.c:1941 [inline]
ffff88805e1a6480 (sb_writers#5){.+.+}-{0:0}, at: loop_process_work+0x637/0x11b0 drivers/block/loop.c:1976

but task is already holding lock:
ffffc90006e27c40 ((work_completion)(&worker->work)){+.+.}-{0:0}, at: process_one_work+0x8be/0x1630 kernel/workqueue.c:3294

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #7 ((work_completion)(&worker->work)){+.+.}-{0:0}:
       process_one_work+0x8d7/0x1630 kernel/workqueue.c:3294
       process_scheduled_works kernel/workqueue.c:3401 [inline]
       worker_thread+0xb49/0x1140 kernel/workqueue.c:3482
       kthread+0x388/0x470 kernel/kthread.c:436
       ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

-> #6 ((wq_completion)loop4){+.+.}-{0:0}:
       touch_wq_lockdep_map+0xcb/0x180 kernel/workqueue.c:4033
       __flush_workqueue+0x14b/0x14f0 kernel/workqueue.c:4075
       drain_workqueue+0xd3/0x390 kernel/workqueue.c:4239
       __loop_clr_fd drivers/block/loop.c:1130 [inline]
       lo_release+0x287/0x8f0 drivers/block/loop.c:1767
       bdev_release+0x541/0x660 block/bdev.c:-1
       blkdev_release+0x15/0x20 block/fops.c:705
       __fput+0x461/0xa70 fs/file_table.c:510
       fput_close_sync+0x11f/0x240 fs/file_table.c:615
       __do_sys_close fs/open.c:1511 [inline]
       __se_sys_close fs/open.c:1496 [inline]
       __x64_sys_close+0x7e/0x110 fs/open.c:1496
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x15f/0x560 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #5 (&disk->open_mutex){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/rtmutex_api.c:559 [inline]
       mutex_lock_nested+0x5a/0x1d0 kernel/locking/rtmutex_api.c:578
       __del_gendisk+0x127/0x980 block/genhd.c:710
       del_gendisk+0xe7/0x160 block/genhd.c:823
       nbd_dev_remove drivers/block/nbd.c:268 [inline]
       nbd_dev_remove_work+0x47/0xe0 drivers/block/nbd.c:284
       process_one_work+0x98b/0x1630 kernel/workqueue.c:3318
       process_scheduled_works kernel/workqueue.c:3401 [inline]
       worker_thread+0xb49/0x1140 kernel/workqueue.c:3482
       kthread+0x388/0x470 kernel/kthread.c:436
       ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

-> #4 (&set->update_nr_hwq_lock){++++}-{4:4}:
       down_read+0x97/0x200 kernel/locking/rwsem.c:1568
       add_disk_fwnode+0xe7/0x480 block/genhd.c:596
       add_disk include/linux/blkdev.h:794 [inline]
       nbd_dev_add+0x72c/0xb50 drivers/block/nbd.c:1984
       nbd_genl_connect+0x965/0x1c80 drivers/block/nbd.c:2125
       genl_family_rcv_msg_doit+0x22a/0x330 net/netlink/genetlink.c:1114
       genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
       genl_rcv_msg+0x61c/0x7a0 net/netlink/genetlink.c:1209
       netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2551
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x780/0x920 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1895
       sock_sendmsg_nosec+0x112/0x150 net/socket.c:797
       __sock_sendmsg net/socket.c:812 [inline]
       ____sys_sendmsg+0x55c/0x870 net/socket.c:2716
       ___sys_sendmsg+0x2a5/0x360 net/socket.c:2770
       __sys_sendmsg net/socket.c:2802 [inline]
       __do_sys_sendmsg net/socket.c:2807 [inline]
       __se_sys_sendmsg net/socket.c:2805 [inline]
       __x64_sys_sendmsg+0x1c3/0x2a0 net/socket.c:2805
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x15f/0x560 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #3 (genl_mutex){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/rtmutex_api.c:559 [inline]
       mutex_lock_nested+0x5a/0x1d0 kernel/locking/rtmutex_api.c:578
       genl_lock net/netlink/genetlink.c:35 [inline]
       genl_lock_all net/netlink/genetlink.c:48 [inline]
       genl_register_family+0x7b9/0x17b0 net/netlink/genetlink.c:784
       vdpa_init+0x39/0x70 drivers/vdpa/vdpa.c:1565
       do_one_initcall+0x250/0x870 init/main.c:1347
       do_initcall_level+0x104/0x190 init/main.c:1409
       do_initcalls+0x59/0xa0 init/main.c:1425
       kernel_init_freeable+0x2a6/0x3e0 init/main.c:1658
       kernel_init+0x1d/0x1d0 init/main.c:1548
       ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

-> #2 (cb_lock){++++}-{4:4}:
       down_read+0x97/0x200 kernel/locking/rwsem.c:1568
       genl_rcv+0x19/0x40 net/netlink/genetlink.c:1217
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x780/0x920 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1895
       sock_sendmsg_nosec+0x112/0x150 net/socket.c:797
       __sock_sendmsg net/socket.c:812 [inline]
       sock_sendmsg+0x1ca/0x2d0 net/socket.c:835
       splice_to_socket+0xae5/0x11f0 fs/splice.c:884
       do_splice_from fs/splice.c:936 [inline]
       do_splice+0xef8/0x1940 fs/splice.c:1349
       __do_splice fs/splice.c:1431 [inline]
       __do_sys_splice fs/splice.c:1634 [inline]
       __se_sys_splice+0x353/0x490 fs/splice.c:1616
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x15f/0x560 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #1 (&pipe->mutex){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/rtmutex_api.c:559 [inline]
       mutex_lock_nested+0x5a/0x1d0 kernel/locking/rtmutex_api.c:578
       iter_file_splice_write+0x1f3/0x10f0 fs/splice.c:682
       do_splice_from fs/splice.c:936 [inline]
       do_splice+0xef8/0x1940 fs/splice.c:1349
       __do_splice fs/splice.c:1431 [inline]
       __do_sys_splice fs/splice.c:1634 [inline]
       __se_sys_splice+0x353/0x490 fs/splice.c:1616
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x15f/0x560 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (sb_writers#5){.+.+}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3167 [inline]
       check_prevs_add kernel/locking/lockdep.c:3286 [inline]
       validate_chain kernel/locking/lockdep.c:3910 [inline]
       __lock_acquire+0x15a5/0x2d10 kernel/locking/lockdep.c:5239
       lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5870
       percpu_down_read_internal include/linux/percpu-rwsem.h:53 [inline]
       percpu_down_read_freezable include/linux/percpu-rwsem.h:83 [inline]
       __sb_start_write include/linux/fs/super.h:19 [inline]
       sb_start_write include/linux/fs/super.h:125 [inline]
       kiocb_start_write include/linux/fs.h:2767 [inline]
       lo_rw_aio+0xb1b/0xf00 drivers/block/loop.c:401
       do_req_filebacked drivers/block/loop.c:433 [inline]
       loop_handle_cmd drivers/block/loop.c:1941 [inline]
       loop_process_work+0x637/0x11b0 drivers/block/loop.c:1976
       process_one_work+0x98b/0x1630 kernel/workqueue.c:3318
       process_scheduled_works kernel/workqueue.c:3401 [inline]
       worker_thread+0xb49/0x1140 kernel/workqueue.c:3482
       kthread+0x388/0x470 kernel/kthread.c:436
       ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

other info that might help us debug this:

Chain exists of:
  sb_writers#5 --> (wq_completion)loop4 --> (work_completion)(&worker->work)

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((work_completion)(&worker->work));
                               lock((wq_completion)loop4);
                               lock((work_completion)(&worker->work));
  rlock(sb_writers#5);

 *** DEADLOCK ***

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2026-05-29 22:06 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-18  0:02 [syzbot] [block?] general protection fault in lo_rw_aio syzbot
2026-04-21 11:05 ` Tetsuo Handa
2026-05-11 11:43   ` [PATCH] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq Tetsuo Handa
2026-05-11 15:58     ` Bart Van Assche
2026-05-11 17:43       ` Tetsuo Handa
2026-05-12 11:46         ` Tetsuo Handa
2026-05-15  1:38           ` [PATCH v2] " Tetsuo Handa
2026-05-19  0:40             ` Andrew Morton
2026-05-19  9:27               ` Tetsuo Handa
2026-05-20  3:06                 ` Ming Lei
2026-05-20  6:36                   ` Tetsuo Handa
2026-05-20  7:49                     ` Ming Lei
2026-05-20  8:20                       ` Tetsuo Handa
2026-05-20  8:54                         ` Ming Lei
2026-05-25  3:40                           ` [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() Tetsuo Handa
2026-05-25 15:19                             ` Ming Lei
2026-05-26  0:25                               ` Tetsuo Handa
2026-05-27  1:20                                 ` Ming Lei
2026-05-27  1:35                                   ` Tetsuo Handa
2026-05-27  3:00                                     ` Ming Lei
2026-05-27 11:29                                       ` Tetsuo Handa
2026-05-27 18:11                                         ` Damien Le Moal
2026-05-28  8:38                                           ` Christoph Hellwig
2026-05-28 10:16                                             ` Qu Wenruo
2026-05-28  5:43                                       ` Hillf Danton
2026-05-28 23:00                                         ` Hillf Danton
2026-05-29  0:14                                           ` Tetsuo Handa
2026-05-29  7:04                                             ` Hillf Danton
2026-05-29 22:05                                               ` Hillf Danton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox