* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() [not found] ` <d1b5a737-f0e3-4927-b762-430b37fbb2f9@I-love.SAKURA.ne.jp> @ 2026-05-27 3:00 ` Ming Lei 2026-05-27 11:29 ` Tetsuo Handa 2026-05-28 5:43 ` Hillf Danton 0 siblings, 2 replies; 21+ messages in thread From: Ming Lei @ 2026-05-27 3:00 UTC (permalink / raw) To: Tetsuo Handa Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner On Wed, May 27, 2026 at 10:35:56AM +0900, Tetsuo Handa wrote: > On 2026/05/27 10:20, Ming Lei wrote: > >> Of course we should try to figure out the root cause first, but how can we do? > > > > Definitely unexpected write IO(after umount & loop closed) from btrfs is more serious, > > which may cause data loss, so CC btrfs list and maintainer. > > Why do you assume that the culprit is btrfs? > > https://syzkaller.appspot.com/bug?extid=bc273027d5643e48e5b3 indicated that > this similar race is also happening with jfs. I just didn't see the above report on jfs. It doesn't change anything, the same question still stands: unexpected write IO is issued or crosses umount & last closing of loop disk. Thanks, Ming ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-05-27 3:00 ` [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() Ming Lei @ 2026-05-27 11:29 ` Tetsuo Handa 2026-05-27 18:11 ` Damien Le Moal 2026-05-28 5:43 ` Hillf Danton 1 sibling, 1 reply; 21+ messages in thread From: Tetsuo Handa @ 2026-05-27 11:29 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner On 2026/05/27 12:00, Ming Lei wrote: > On Wed, May 27, 2026 at 10:35:56AM +0900, Tetsuo Handa wrote: >> On 2026/05/27 10:20, Ming Lei wrote: >>>> Of course we should try to figure out the root cause first, but how can we do? >>> >>> Definitely unexpected write IO(after umount & loop closed) from btrfs is more serious, >>> which may cause data loss, so CC btrfs list and maintainer. >> I had a conversation with Google AI mode, and received the following response. -------------------------------------------------------------------------------- Technical Analysis: lo_rw_aio Null Pointer Dereference / UAF since v7.1-rc1 1. The Root Cause of the Timing Shift This regression was introduced during the v7.1-rc1 merge window, primarily exposed by Commit 65565ca5f99b ("block: unify the synchronous bi_end_io callbacks"), along with helper refactorings like Commit 92c3737a2473 ("block: add a bio_submit_or_kill helper"). Prior to v7.0, the synchronous I/O completion path inherently contained execution lags (due to serialized completion handling and context switches) before notifying upper layers. This latency accidentally acted as a natural safety barrier. It ensured that by the time a file system completed its final sync_filesystem() and initiated umount, the loop driver's internal workqueue (lo_rw_aio) had already finished processing everything. In v7.1, the unification and optimization of bi_end_io significantly minimized this latency. The filesystem now learns of "I/O completion" much faster. Consequently, highly-concurrent execution pipelines like btrfs or jfs proceed rapidly through kill_sb() and blkdev_put(), ultimately invoking lo_release() -> __loop_clr_fd() while the loop driver's backend kworker is still in the middle of executing the last sub-millisecond asynchronous file-backed I/O request. 2. Why the Block Layer's Built-in Quiesce/Freeze Fails There is an implicit assumption that standard block layer freeze mechanisms (blk_mq_freeze_queue()) protect the device lifetime during release. However, the v7.1 BIO helper refactoring introduced a synchronization gap: 1. The filesystem triggers its final metadata or journal updates (e.g., txCommit in jfs or delayed refcount updates in btrfs) right during the unmount/close boundary. 2. Due to the optimized execution path, these requests bypass the block layer's active request-tracking metrics at the exact moment blk_mq_freeze_queue() or state validation checks evaluated them as zero. 3. The block layer assumes the queue is safe and silent, allowing __loop_clr_fd() to progress and nullify lo->lo_backing_file (or trigger fput()). 4. The leaked asynchronous kworker wakes up a fraction of a millisecond too late, attempts to access lo->lo_backing_file or invokes kiocb_end_write() -> file_inode(), leading to either a general protection fault (Null pointer dereference) or a Use-After-Free (UAF). 3. Why This Isn't Just an "Unexpected FS Bug" While the write I/O originates from file systems like btrfs and jfs post-close, blaming the file systems entirely ignores the underlying infrastructure change. The core issue is that the block layer altered its synchronization behavior, breaking the barrier contract that VFS and file systems historically relied on during the device release path. Papering over this inside individual file systems would require adding heavy, duplicated barriers inside every single filesystem's unmount path. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-05-27 11:29 ` Tetsuo Handa @ 2026-05-27 18:11 ` Damien Le Moal 2026-05-28 8:38 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: Damien Le Moal @ 2026-05-27 18:11 UTC (permalink / raw) To: Tetsuo Handa, Ming Lei Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner On 2026/05/27 20:29, Tetsuo Handa wrote: > On 2026/05/27 12:00, Ming Lei wrote: >> On Wed, May 27, 2026 at 10:35:56AM +0900, Tetsuo Handa wrote: >>> On 2026/05/27 10:20, Ming Lei wrote: >>>>> Of course we should try to figure out the root cause first, but how can we do? >>>> >>>> Definitely unexpected write IO(after umount & loop closed) from btrfs is more serious, >>>> which may cause data loss, so CC btrfs list and maintainer. >>> > > I had a conversation with Google AI mode, and received the following response. > > -------------------------------------------------------------------------------- > Technical Analysis: lo_rw_aio Null Pointer Dereference / UAF since v7.1-rc1 > > > 1. The Root Cause of the Timing Shift > > This regression was introduced during the v7.1-rc1 merge window, primarily exposed by > Commit 65565ca5f99b ("block: unify the synchronous bi_end_io callbacks"), along with > helper refactorings like Commit 92c3737a2473 ("block: add a bio_submit_or_kill helper"). > > Prior to v7.0, the synchronous I/O completion path inherently contained execution lags (due > to serialized completion handling and context switches) before notifying upper layers. This > latency accidentally acted as a natural safety barrier. It ensured that by the time a file > system completed its final sync_filesystem() and initiated umount, the loop driver's internal > workqueue (lo_rw_aio) had already finished processing everything. > > In v7.1, the unification and optimization of bi_end_io significantly minimized this latency. > The filesystem now learns of "I/O completion" much faster. Consequently, highly-concurrent > execution pipelines like btrfs or jfs proceed rapidly through kill_sb() and blkdev_put(), > ultimately invoking lo_release() -> __loop_clr_fd() while the loop driver's backend kworker > is still in the middle of executing the last sub-millisecond asynchronous file-backed I/O > request. > > > 2. Why the Block Layer's Built-in Quiesce/Freeze Fails > > There is an implicit assumption that standard block layer freeze mechanisms (blk_mq_freeze_queue()) > protect the device lifetime during release. However, the v7.1 BIO helper refactoring introduced > a synchronization gap: > > 1. The filesystem triggers its final metadata or journal updates (e.g., txCommit in jfs or > delayed refcount updates in btrfs) right during the unmount/close boundary. > 2. Due to the optimized execution path, these requests bypass the block layer's active > request-tracking metrics at the exact moment blk_mq_freeze_queue() or state validation > checks evaluated them as zero. > 3. The block layer assumes the queue is safe and silent, allowing __loop_clr_fd() to > progress and nullify lo->lo_backing_file (or trigger fput()). > 4. The leaked asynchronous kworker wakes up a fraction of a millisecond too late, attempts > to access lo->lo_backing_file or invokes kiocb_end_write() -> file_inode(), leading to > either a general protection fault (Null pointer dereference) or a Use-After-Free (UAF). > > > 3. Why This Isn't Just an "Unexpected FS Bug" > > While the write I/O originates from file systems like btrfs and jfs post-close, blaming the > file systems entirely ignores the underlying infrastructure change. The core issue is that the > block layer altered its synchronization behavior, breaking the barrier contract that > VFS and file systems historically relied on during the device release path. > > Papering over this inside individual file systems would require adding heavy, duplicated > barriers inside every single filesystem's unmount path. It sounds like the VFS unmount call needs to have something that waits for sync() to complete. Though, it really feels very strange that an FS can complete unmount without itself ensuring that there are no more IOs in flight. The generic VFS layer cannot know what the FS needs to flush on unmount, so waiting on a generic sync might not be enough. It really feels like this is a btrfs and jfs issue, unless the same can be reproduced with any file system (XFS, ext4, f2fs, ...). Just my 2 cents. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-05-27 18:11 ` Damien Le Moal @ 2026-05-28 8:38 ` Christoph Hellwig 2026-05-28 10:16 ` Qu Wenruo 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2026-05-28 8:38 UTC (permalink / raw) To: Damien Le Moal Cc: Tetsuo Handa, Ming Lei, Jens Axboe, Bart Van Assche, Christoph Hellwig, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner On Thu, May 28, 2026 at 03:11:05AM +0900, Damien Le Moal wrote: > It sounds like the VFS unmount call needs to have something that waits for > sync() to complete. Though, it really feels very strange that an FS can complete I don't think this is the VFS-controlled VFS file data writeback, which we wait on, but some kind of fs controlled metadata. And yes, it looks like those file systems are buggy in that area. We definitively had such bugs in XFS before and fixed them. e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against unmount") ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-05-28 8:38 ` Christoph Hellwig @ 2026-05-28 10:16 ` Qu Wenruo 2026-06-01 14:40 ` Christoph Hellwig 2026-06-01 15:29 ` Ming Lei 0 siblings, 2 replies; 21+ messages in thread From: Qu Wenruo @ 2026-05-28 10:16 UTC (permalink / raw) To: Christoph Hellwig, Damien Le Moal Cc: Tetsuo Handa, Ming Lei, Jens Axboe, Bart Van Assche, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner 在 2026/5/28 18:08, Christoph Hellwig 写道: > On Thu, May 28, 2026 at 03:11:05AM +0900, Damien Le Moal wrote: >> It sounds like the VFS unmount call needs to have something that waits for >> sync() to complete. Though, it really feels very strange that an FS can complete > > I don't think this is the VFS-controlled VFS file data writeback, which > we wait on, but some kind of fs controlled metadata. And yes, it looks > like those file systems are buggy in that area. We definitively had > such bugs in XFS before and fixed them. > > e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against > unmount") Considering the xfs fix is pretty old, it's before the fix hint thus no such mention in fstests. Do you happen to know which test case is for that fix? I'd like to adapt it for btrfs as a reproducer. This syzbot report doesn't provide a reproducer. Another thing is, if it's some btrfs bios on-the-fly after close_ctree(), the most common symptom should be NULL pointer dereference inside various btrfs endio functions. As all those end_bbio_*() functions are referring to either fs_info or inode/eb, thus if the fs is unmounted before the bio finished, they should all cause use-after-free. The only exception is discard, which is using blkdev_issue_discard() thus has no such reference to btrfs internal structure, but that's out of my understanding. Thanks, Qu ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-05-28 10:16 ` Qu Wenruo @ 2026-06-01 14:40 ` Christoph Hellwig 2026-06-01 16:29 ` Brian Foster 2026-06-01 15:29 ` Ming Lei 1 sibling, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2026-06-01 14:40 UTC (permalink / raw) To: Qu Wenruo Cc: Christoph Hellwig, Damien Le Moal, Tetsuo Handa, Ming Lei, Jens Axboe, Bart Van Assche, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner, Brian Foster On Thu, May 28, 2026 at 07:46:24PM +0930, Qu Wenruo wrote: >> e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against >> unmount") > Considering the xfs fix is pretty old, it's before the fix hint thus no > such mention in fstests. > > Do you happen to know which test case is for that fix? > I'd like to adapt it for btrfs as a reproducer. No. Adding Brian who authored that commit. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-06-01 14:40 ` Christoph Hellwig @ 2026-06-01 16:29 ` Brian Foster 2026-06-01 22:27 ` Qu Wenruo 0 siblings, 1 reply; 21+ messages in thread From: Brian Foster @ 2026-06-01 16:29 UTC (permalink / raw) To: Christoph Hellwig Cc: Qu Wenruo, Damien Le Moal, Tetsuo Handa, Ming Lei, Jens Axboe, Bart Van Assche, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner On Mon, Jun 01, 2026 at 04:40:34PM +0200, Christoph Hellwig wrote: > On Thu, May 28, 2026 at 07:46:24PM +0930, Qu Wenruo wrote: > >> e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against > >> unmount") > > Considering the xfs fix is pretty old, it's before the fix hint thus no > > such mention in fstests. > > > > Do you happen to know which test case is for that fix? > > I'd like to adapt it for btrfs as a reproducer. > > No. Adding Brian who authored that commit. > I haven't followed through the full thread here... But if you're just looking for an existing test case associated with the commit above on XFS, I did some quick digging and xfs/311 is the original reproducer for that one. Brian ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-06-01 16:29 ` Brian Foster @ 2026-06-01 22:27 ` Qu Wenruo 0 siblings, 0 replies; 21+ messages in thread From: Qu Wenruo @ 2026-06-01 22:27 UTC (permalink / raw) To: Brian Foster, Christoph Hellwig Cc: Damien Le Moal, Tetsuo Handa, Ming Lei, Jens Axboe, Bart Van Assche, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner 在 2026/6/2 01:59, Brian Foster 写道: > On Mon, Jun 01, 2026 at 04:40:34PM +0200, Christoph Hellwig wrote: >> On Thu, May 28, 2026 at 07:46:24PM +0930, Qu Wenruo wrote: >>>> e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against >>>> unmount") >>> Considering the xfs fix is pretty old, it's before the fix hint thus no >>> such mention in fstests. >>> >>> Do you happen to know which test case is for that fix? >>> I'd like to adapt it for btrfs as a reproducer. >> >> No. Adding Brian who authored that commit. >> > > I haven't followed through the full thread here... But if you're just > looking for an existing test case associated with the commit above on > XFS, I did some quick digging and xfs/311 is the original reproducer for > that one. Thanks a lot! I'll use the same delayed umount to verify the behavior of btrfs. Thanks, Qu > > Brian > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-05-28 10:16 ` Qu Wenruo 2026-06-01 14:40 ` Christoph Hellwig @ 2026-06-01 15:29 ` Ming Lei 2026-06-01 21:51 ` Hillf Danton 1 sibling, 1 reply; 21+ messages in thread From: Ming Lei @ 2026-06-01 15:29 UTC (permalink / raw) To: Qu Wenruo Cc: Christoph Hellwig, Damien Le Moal, Tetsuo Handa, Jens Axboe, Bart Van Assche, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner On Thu, May 28, 2026 at 5:16 AM Qu Wenruo <wqu@suse.com> wrote: > > > > 在 2026/5/28 18:08, Christoph Hellwig 写道: > > On Thu, May 28, 2026 at 03:11:05AM +0900, Damien Le Moal wrote: > >> It sounds like the VFS unmount call needs to have something that waits for > >> sync() to complete. Though, it really feels very strange that an FS can complete > > > > I don't think this is the VFS-controlled VFS file data writeback, which > > we wait on, but some kind of fs controlled metadata. And yes, it looks > > like those file systems are buggy in that area. We definitively had > > such bugs in XFS before and fixed them. > > > > e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against > > unmount") > Considering the xfs fix is pretty old, it's before the fix hint thus no > such mention in fstests. > > Do you happen to know which test case is for that fix? > I'd like to adapt it for btrfs as a reproducer. > > This syzbot report doesn't provide a reproducer. > > > Another thing is, if it's some btrfs bios on-the-fly after > close_ctree(), the most common symptom should be NULL pointer > dereference inside various btrfs endio functions. > As all those end_bbio_*() functions are referring to either fs_info or > inode/eb, thus if the fs is unmounted before the bio finished, they > should all cause use-after-free. > > The only exception is discard, which is using blkdev_issue_discard() > thus has no such reference to btrfs internal structure, but that's out > of my understanding. syzbot log shows the null-ptr-deref is on WRITE, instead of DISCARD. https://syzkaller.appspot.com/bug?extid=cd8a9a308e879a4e2c28 Adding WARN_ON(!lo->lo_backing_file) in loop_queue_rq() might capture this bio submission context if this req isn't issued via wq. Thanks, Ming Lei ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-06-01 15:29 ` Ming Lei @ 2026-06-01 21:51 ` Hillf Danton 2026-06-01 22:14 ` Ming Lei 0 siblings, 1 reply; 21+ messages in thread From: Hillf Danton @ 2026-06-01 21:51 UTC (permalink / raw) To: Ming Lei Cc: Qu Wenruo, Christoph Hellwig, Damien Le Moal, Tetsuo Handa, Jens Axboe, Bart Van Assche, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner On Mon, 1 Jun 2026 10:29:25 -0500 Ming Lei wrote: >On Thu, May 28, 2026 at 5:16 AM Qu Wenruo <wqu@suse.com> wrote: >> 在 2026/5/28 18:08, Christoph Hellwig 写道: >> > On Thu, May 28, 2026 at 03:11:05AM +0900, Damien Le Moal wrote: >> >> It sounds like the VFS unmount call needs to have something that waits for >> >> sync() to complete. Though, it really feels very strange that an FS can complete >> > >> > I don't think this is the VFS-controlled VFS file data writeback, which >> > we wait on, but some kind of fs controlled metadata. And yes, it looks >> > like those file systems are buggy in that area. We definitively had >> > such bugs in XFS before and fixed them. >> > >> > e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against >> > unmount") >> Considering the xfs fix is pretty old, it's before the fix hint thus no >> such mention in fstests. >> >> Do you happen to know which test case is for that fix? >> I'd like to adapt it for btrfs as a reproducer. >> >> This syzbot report doesn't provide a reproducer. >> >> >> Another thing is, if it's some btrfs bios on-the-fly after >> close_ctree(), the most common symptom should be NULL pointer >> dereference inside various btrfs endio functions. >> As all those end_bbio_*() functions are referring to either fs_info or >> inode/eb, thus if the fs is unmounted before the bio finished, they >> should all cause use-after-free. >> >> The only exception is discard, which is using blkdev_issue_discard() >> thus has no such reference to btrfs internal structure, but that's out >> of my understanding. > > syzbot log shows the null-ptr-deref is on WRITE, instead of DISCARD. > > https://syzkaller.appspot.com/bug?extid=cd8a9a308e879a4e2c28 > > Adding WARN_ON(!lo->lo_backing_file) in loop_queue_rq() might capture > this bio submission context if this req isn't issued via wq. > I suspect this makes $.02 sense given the check of Lo_bound upon queuing rq. static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd) { struct request *rq = bd->rq; struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq); struct loop_device *lo = rq->q->queuedata; blk_mq_start_request(rq); if (data_race(READ_ONCE(lo->lo_state)) != Lo_bound) return BLK_STS_IOERR; ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-06-01 21:51 ` Hillf Danton @ 2026-06-01 22:14 ` Ming Lei 2026-06-01 23:17 ` Hillf Danton 0 siblings, 1 reply; 21+ messages in thread From: Ming Lei @ 2026-06-01 22:14 UTC (permalink / raw) To: Hillf Danton Cc: Qu Wenruo, Christoph Hellwig, Damien Le Moal, Tetsuo Handa, Jens Axboe, Bart Van Assche, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner On Tue, Jun 02, 2026 at 05:51:26AM +0800, Hillf Danton wrote: > On Mon, 1 Jun 2026 10:29:25 -0500 Ming Lei wrote: > >On Thu, May 28, 2026 at 5:16 AM Qu Wenruo <wqu@suse.com> wrote: > >> 在 2026/5/28 18:08, Christoph Hellwig 写道: > >> > On Thu, May 28, 2026 at 03:11:05AM +0900, Damien Le Moal wrote: > >> >> It sounds like the VFS unmount call needs to have something that waits for > >> >> sync() to complete. Though, it really feels very strange that an FS can complete > >> > > >> > I don't think this is the VFS-controlled VFS file data writeback, which > >> > we wait on, but some kind of fs controlled metadata. And yes, it looks > >> > like those file systems are buggy in that area. We definitively had > >> > such bugs in XFS before and fixed them. > >> > > >> > e.g. 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against > >> > unmount") > >> Considering the xfs fix is pretty old, it's before the fix hint thus no > >> such mention in fstests. > >> > >> Do you happen to know which test case is for that fix? > >> I'd like to adapt it for btrfs as a reproducer. > >> > >> This syzbot report doesn't provide a reproducer. > >> > >> > >> Another thing is, if it's some btrfs bios on-the-fly after > >> close_ctree(), the most common symptom should be NULL pointer > >> dereference inside various btrfs endio functions. > >> As all those end_bbio_*() functions are referring to either fs_info or > >> inode/eb, thus if the fs is unmounted before the bio finished, they > >> should all cause use-after-free. > >> > >> The only exception is discard, which is using blkdev_issue_discard() > >> thus has no such reference to btrfs internal structure, but that's out > >> of my understanding. > > > > syzbot log shows the null-ptr-deref is on WRITE, instead of DISCARD. > > > > https://syzkaller.appspot.com/bug?extid=cd8a9a308e879a4e2c28 > > > > Adding WARN_ON(!lo->lo_backing_file) in loop_queue_rq() might capture > > this bio submission context if this req isn't issued via wq. > > > I suspect this makes $.02 sense given the check of Lo_bound upon queuing rq. Can't lo->lo_state be updated after the check? It is totally lockless... Thanks, Ming ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-06-01 22:14 ` Ming Lei @ 2026-06-01 23:17 ` Hillf Danton 2026-06-01 23:36 ` Ming Lei 0 siblings, 1 reply; 21+ messages in thread From: Hillf Danton @ 2026-06-01 23:17 UTC (permalink / raw) To: Ming Lei Cc: Qu Wenruo, Christoph Hellwig, Damien Le Moal, Tetsuo Handa, Jens Axboe, Bart Van Assche, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner On Mon, 1 Jun 2026 17:14:59 -0500 Ming Lei wrote: > On Tue, Jun 02, 2026 at 05:51:26AM +0800, Hillf Danton wrote: > > On OnMon, 1 Jun 2026 10:29:25 -0500 Ming Lei wrote: > > > syzbot log shows the null-ptr-deref is on WRITE, instead of DISCARD. > > > > > > https://syzkaller.appspot.com/bug?extid=cd8a9a308e879a4e2c28 > > > > > > Adding WARN_ON(!lo->lo_backing_file) in loop_queue_rq() might capture > > > this bio submission context if this req isn't issued via wq. > > > > > I suspect this makes $.02 sense given the check of Lo_bound upon queuing rq. > > Can't lo->lo_state be updated after the check? It is totally lockless... > Sounds good hm... do you mean it is UNWISE to not flush the loop workqueue when closing disk? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-06-01 23:17 ` Hillf Danton @ 2026-06-01 23:36 ` Ming Lei 2026-06-02 2:02 ` Hillf Danton 0 siblings, 1 reply; 21+ messages in thread From: Ming Lei @ 2026-06-01 23:36 UTC (permalink / raw) To: Hillf Danton Cc: Qu Wenruo, Christoph Hellwig, Damien Le Moal, Tetsuo Handa, Jens Axboe, Bart Van Assche, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner On Tue, Jun 02, 2026 at 07:17:30AM +0800, Hillf Danton wrote: > On Mon, 1 Jun 2026 17:14:59 -0500 Ming Lei wrote: > > On Tue, Jun 02, 2026 at 05:51:26AM +0800, Hillf Danton wrote: > > > On OnMon, 1 Jun 2026 10:29:25 -0500 Ming Lei wrote: > > > > syzbot log shows the null-ptr-deref is on WRITE, instead of DISCARD. > > > > > > > > https://syzkaller.appspot.com/bug?extid=cd8a9a308e879a4e2c28 > > > > > > > > Adding WARN_ON(!lo->lo_backing_file) in loop_queue_rq() might capture > > > > this bio submission context if this req isn't issued via wq. > > > > > > > I suspect this makes $.02 sense given the check of Lo_bound upon queuing rq. > > > > Can't lo->lo_state be updated after the check? It is totally lockless... > > > Sounds good hm... do you mean it is UNWISE to not flush the loop workqueue > when closing disk? Quite the opposite, it is wise to not flush wq in __loop_clr_fd(), please see my previous comment. Thanks, Ming ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-06-01 23:36 ` Ming Lei @ 2026-06-02 2:02 ` Hillf Danton 0 siblings, 0 replies; 21+ messages in thread From: Hillf Danton @ 2026-06-02 2:02 UTC (permalink / raw) To: Ming Lei Cc: Qu Wenruo, Christoph Hellwig, Damien Le Moal, Tetsuo Handa, Jens Axboe, Bart Van Assche, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner on Mon, 1 Jun 2026 18:36:19 -0500 Ming Lei wrote: > On Tue, Jun 02, 2026 at 07:17:30AM +0800, Hillf Danton wrote: > > On Mon, 1 Jun 2026 17:14:59 -0500 Ming Lei wrote: > > > On Tue, Jun 02, 2026 at 05:51:26AM +0800, Hillf Danton wrote: > > > > On OnMon, 1 Jun 2026 10:29:25 -0500 Ming Lei wrote: > > > > > syzbot log shows the null-ptr-deref is on WRITE, instead of DISCARD. > > > > > > > > > > https://syzkaller.appspot.com/bug?extid=cd8a9a308e879a4e2c28 > > > > > > > > > > Adding WARN_ON(!lo->lo_backing_file) in loop_queue_rq() might capture > > > > > this bio submission context if this req isn't issued via wq. > > > > > > > > > I suspect this makes $.02 sense given the check of Lo_bound upon queuing rq. > > > > > > Can't lo->lo_state be updated after the check? It is totally lockless... > > > > > Sounds good hm... do you mean it is UNWISE to not flush the loop workqueue > > when closing disk? > > Quite the opposite, it is wise to not flush wq in __loop_clr_fd(), please > see my previous comment. > When queuing rq, if lo_state is updated after checking Lo_bond, I see nothing that prevents syzbot from reporting null-ptr-deref exists. Can you tippoint why flush is NOT needed if you are right? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-05-27 3:00 ` [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() Ming Lei 2026-05-27 11:29 ` Tetsuo Handa @ 2026-05-28 5:43 ` Hillf Danton 2026-05-28 23:00 ` Hillf Danton 1 sibling, 1 reply; 21+ messages in thread From: Hillf Danton @ 2026-05-28 5:43 UTC (permalink / raw) To: Ming Lei Cc: Tetsuo Handa, Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner On Tue, 26 May 2026 22:00:49 -0500 Ming Lei wrote: >On Wed, May 27, 2026 at 10:35:56AM +0900, Tetsuo Handa wrote: >> On 2026/05/27 10:20, Ming Lei wrote: >> >> Of course we should try to figure out the root cause first, but how can we do? >> > >> > Definitely unexpected write IO(after umount & loop closed) from btrfs is more serious, >> > which may cause data loss, so CC btrfs list and maintainer. >> >> Why do you assume that the culprit is btrfs? >> >> https://syzkaller.appspot.com/bug?extid=bc273027d5643e48e5b3 indicated that >> this similar race is also happening with jfs. > > I just didn't see the above report on jfs. > > It doesn't change anything, the same question still stands: unexpected write IO is issued > or crosses umount & last closing of loop disk. > Given the loop workqueue that triggered the jfs warning, can you specify the reason why the workqueue in question is NOT flushed while closing disk? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-05-28 5:43 ` Hillf Danton @ 2026-05-28 23:00 ` Hillf Danton 2026-05-29 0:14 ` Tetsuo Handa 0 siblings, 1 reply; 21+ messages in thread From: Hillf Danton @ 2026-05-28 23:00 UTC (permalink / raw) To: Ming Lei Cc: Tetsuo Handa, Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner On Thu, 28 May 2026 13:43:31 +0800 Hillf Danton wrote: >On Tue, 26 May 2026 22:00:49 -0500 Ming Lei wrote: >>On Wed, May 27, 2026 at 10:35:56AM +0900, Tetsuo Handa wrote: >>> On 2026/05/27 10:20, Ming Lei wrote: >>> >> Of course we should try to figure out the root cause first, but how can we do? >>> > >>> > Definitely unexpected write IO(after umount & loop closed) from btrfs is more serious, >>> > which may cause data loss, so CC btrfs list and maintainer. >>> >>> Why do you assume that the culprit is btrfs? >>> >>> https://syzkaller.appspot.com/bug?extid=bc273027d5643e48e5b3 indicated that >>> this similar race is also happening with jfs. >> >> I just didn't see the above report on jfs. >> >> It doesn't change anything, the same question still stands: unexpected write IO is issued >> or crosses umount & last closing of loop disk. >> > Given the loop workqueue that triggered the jfs warning, can you specify > the reason why the workqueue in question is NOT flushed while closing disk? > Got it, the loop workqueue is NOT flushed to avoid deadlock, see d292dc80686a ("loop: don't destroy lo->workqueue in __loop_clr_fd") for detail. And the deadlock can be reproduced by flushing the loop workqueue with disk->open_mutex held [1]. [1] Subject: Re: [syzbot] possible deadlock in blkdev_put (3) https://lore.kernel.org/lkml/000000000000ea753505da2658d5@google.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-05-28 23:00 ` Hillf Danton @ 2026-05-29 0:14 ` Tetsuo Handa 2026-05-29 7:04 ` Hillf Danton 0 siblings, 1 reply; 21+ messages in thread From: Tetsuo Handa @ 2026-05-29 0:14 UTC (permalink / raw) To: Hillf Danton, Ming Lei Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner On 2026/05/29 8:00, Hillf Danton wrote: >> Given the loop workqueue that triggered the jfs warning, can you specify >> the reason why the workqueue in question is NOT flushed while closing disk? >> > Got it, the loop workqueue is NOT flushed to avoid deadlock, see d292dc80686a > ("loop: don't destroy lo->workqueue in __loop_clr_fd") for detail. > And the deadlock can be reproduced by flushing the loop workqueue with > disk->open_mutex held [1]. > > [1] Subject: Re: [syzbot] possible deadlock in blkdev_put (3) > https://lore.kernel.org/lkml/000000000000ea753505da2658d5@google.com/ We can avoid the following lockdep warnings (including [1] you mentioned) https://syzkaller.appspot.com/bug?extid=2f62807dc3239b8f584e https://syzkaller.appspot.com/bug?extid=c4e9d077bcc86bee08dc https://syzkaller.appspot.com/bug?extid=0f427123ae84b3ba6dc7 https://syzkaller.appspot.com/bug?extid=4feabfc9641267769c97 https://syzkaller.appspot.com/bug?extid=fb0ff9bfe34ad282ebd4 caused by "drain_workqueue() with disk->open_mutex held" if we assign caller-specific lockdep class to disk->open_mutex https://sourceforge.net/p/tomoyo/tomoyo.git/ci/c2245c765ebeba9dcb924d9171d8d470a9ac41c8/ . Also, we can avoid lockdep warning caused by "drain_workqueue() with disk->open_mutex held" + "holding system_transition_mutex" if we forbid binding to pseudo files as backing file in the loop driver https://lkml.kernel.org/r/d38e4600-3c32-491f-aa49-905f4fad1bfb@I-love.SAKURA.ne.jp which we can reproduce with echo 7:0 > /sys/power/resume losetup /dev/loop0 /sys/power/resume cat /dev/loop0 > /dev/null losetup -d /dev/loop0 . Therefore, I think we can address this problem by "drain_workqueue() with disk->open_mutex held" in the loop driver side. However, the possibility that the last milli-second writeback request (which runs during unmount operation) from filesystem fails due to if (data_race(READ_ONCE(lo->lo_state)) != Lo_bound) return BLK_STS_IOERR; check in loop_queue_rq() will remain. Therefore, addressing this problem within individual filesystem will be more strict solution. But guessing from the pace jfs fixes bugs, it would take long time before we stop seeing this problem... ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-05-29 0:14 ` Tetsuo Handa @ 2026-05-29 7:04 ` Hillf Danton 2026-05-29 22:05 ` Hillf Danton 0 siblings, 1 reply; 21+ messages in thread From: Hillf Danton @ 2026-05-29 7:04 UTC (permalink / raw) To: Tetsuo Handa Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal, Ming Lei, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner On Fri, 29 May 2026 09:14:47 +0900 Tetsuo Handa wrote: >On 2026/05/29 8:00, Hillf Danton wrote: >>> Given the loop workqueue that triggered the jfs warning, can you specify >>> the reason why the workqueue in question is NOT flushed while closing disk? >>> >> Got it, the loop workqueue is NOT flushed to avoid deadlock, see d292dc80686a >> ("loop: don't destroy lo->workqueue in __loop_clr_fd") for detail. >> And the deadlock can be reproduced by flushing the loop workqueue with >> disk->open_mutex held [1]. >> >> [1] Subject: Re: [syzbot] possible deadlock in blkdev_put (3) >> https://lore.kernel.org/lkml/000000000000ea753505da2658d5@google.com/ > >We can avoid the following lockdep warnings (including [1] you mentioned) > > https://syzkaller.appspot.com/bug?extid=2f62807dc3239b8f584e > https://syzkaller.appspot.com/bug?extid=c4e9d077bcc86bee08dc > https://syzkaller.appspot.com/bug?extid=0f427123ae84b3ba6dc7 > https://syzkaller.appspot.com/bug?extid=4feabfc9641267769c97 > https://syzkaller.appspot.com/bug?extid=fb0ff9bfe34ad282ebd4 > >caused by "drain_workqueue() with disk->open_mutex held" if we assign >caller-specific lockdep class to disk->open_mutex > > https://sourceforge.net/p/tomoyo/tomoyo.git/ci/c2245c765ebeba9dcb924d9171d8d470a9ac41c8/ > >. > >Also, we can avoid lockdep warning caused by "drain_workqueue() with disk->open_mutex held" + >"holding system_transition_mutex" if we forbid binding to pseudo files as backing file >in the loop driver > > https://lkml.kernel.org/r/d38e4600-3c32-491f-aa49-905f4fad1bfb@I-love.SAKURA.ne.jp > >which we can reproduce with > > echo 7:0 > /sys/power/resume > losetup /dev/loop0 /sys/power/resume > cat /dev/loop0 > /dev/null > losetup -d /dev/loop0 > >. > >Therefore, I think we can address this problem by "drain_workqueue() with disk->open_mutex >held" in the loop driver side. > Good news. > > >However, the possibility that the last milli-second writeback request >(which runs during unmount operation) from filesystem fails due to > > if (data_race(READ_ONCE(lo->lo_state)) != Lo_bound) > return BLK_STS_IOERR; > >check in loop_queue_rq() will remain. This conflicts with "There is no need to destroy the workqueue when clearing unbinding a loop device from a backing file." in d292dc80686a >Therefore, addressing this problem >within individual filesystem will be more strict solution. But guessing from Conflicts with "Another thing is, if it's some btrfs bios on-the-fly after close_ctree(), the most common symptom should be NULL pointer dereference inside various btrfs endio functions." [2] once more. And you need to pay the fs guys more than two cents I think for cooking a FIX. [2] Subject: Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() https://lore.kernel.org/lkml/36571f8a-4df8-4152-b078-d82dbff4ad7e@suse.com/ >the pace jfs fixes bugs, it would take long time before we stop seeing >this problem... ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-05-29 7:04 ` Hillf Danton @ 2026-05-29 22:05 ` Hillf Danton 2026-05-30 23:57 ` Tetsuo Handa 0 siblings, 1 reply; 21+ messages in thread From: Hillf Danton @ 2026-05-29 22:05 UTC (permalink / raw) To: Tetsuo Handa Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal, Ming Lei, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner, syzbot+78ad2c6a58c0a1faa5f5 On Fri, 29 May 2026 15:04:10 +0800 Hillf Danton wrote: >On Fri, 29 May 2026 09:14:47 +0900 Tetsuo Handa wrote: >>On 2026/05/29 8:00, Hillf Danton wrote: >>>> Given the loop workqueue that triggered the jfs warning, can you specify >>>> the reason why the workqueue in question is NOT flushed while closing disk? >>>> >>> Got it, the loop workqueue is NOT flushed to avoid deadlock, see d292dc80686a >>> ("loop: don't destroy lo->workqueue in __loop_clr_fd") for detail. >>> And the deadlock can be reproduced by flushing the loop workqueue with >>> disk->open_mutex held [1]. >>> >>> [1] Subject: Re: [syzbot] possible deadlock in blkdev_put (3) >>> https://lore.kernel.org/lkml/000000000000ea753505da2658d5@google.com/ >> >>We can avoid the following lockdep warnings (including [1] you mentioned) >> >> https://syzkaller.appspot.com/bug?extid=2f62807dc3239b8f584e >> https://syzkaller.appspot.com/bug?extid=c4e9d077bcc86bee08dc >> https://syzkaller.appspot.com/bug?extid=0f427123ae84b3ba6dc7 >> https://syzkaller.appspot.com/bug?extid=4feabfc9641267769c97 >> https://syzkaller.appspot.com/bug?extid=fb0ff9bfe34ad282ebd4 >> >>caused by "drain_workqueue() with disk->open_mutex held" if we assign >>caller-specific lockdep class to disk->open_mutex >> >> https://sourceforge.net/p/tomoyo/tomoyo.git/ci/c2245c765ebeba9dcb924d9171d8d470a9ac41c8/ >> >>. >> >>Also, we can avoid lockdep warning caused by "drain_workqueue() with disk->open_mutex held" + >>"holding system_transition_mutex" if we forbid binding to pseudo files as backing file >>in the loop driver >> >> https://lkml.kernel.org/r/d38e4600-3c32-491f-aa49-905f4fad1bfb@I-love.SAKURA.ne.jp >> >>which we can reproduce with >> >> echo 7:0 > /sys/power/resume >> losetup /dev/loop0 /sys/power/resume >> cat /dev/loop0 > /dev/null >> losetup -d /dev/loop0 >> >>. >> >> Therefore, I think we can address this problem by "drain_workqueue() with disk->open_mutex >> held" in the loop driver side. >> > Good news. > Bad news: Subject: [syzbot] [block?] possible deadlock in loop_process_work [3] https://lore.kernel.org/lkml/6a19f5f7.5099cdd9.8e407.0004.GAE@google.com/ syzbot found the following issue on: HEAD commit: c1ecb239fa34 Add linux-next specific files for 20260522 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=12fa6336580000 kernel config: https://syzkaller.appspot.com/x/.config?x=77a9211ff284de54 dashboard link: https://syzkaller.appspot.com/bug?extid=78ad2c6a58c0a1faa5f5 compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8 Unfortunately, I don't have any reproducer for this issue yet. Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/4cb88c910144/disk-c1ecb239.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/4a9bc938cf88/vmlinux-c1ecb239.xz kernel image: https://storage.googleapis.com/syzbot-assets/684f1e33f264/bzImage-c1ecb239.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+78ad2c6a58c0a1faa5f5@syzkaller.appspotmail.com ====================================================== WARNING: possible circular locking dependency detected syzkaller #0 Tainted: G L ------------------------------------------------------ kworker/u8:15/1491 is trying to acquire lock: ffff88805e1a6480 (sb_writers#5){.+.+}-{0:0}, at: do_req_filebacked drivers/block/loop.c:433 [inline] ffff88805e1a6480 (sb_writers#5){.+.+}-{0:0}, at: loop_handle_cmd drivers/block/loop.c:1941 [inline] ffff88805e1a6480 (sb_writers#5){.+.+}-{0:0}, at: loop_process_work+0x637/0x11b0 drivers/block/loop.c:1976 but task is already holding lock: ffffc90006e27c40 ((work_completion)(&worker->work)){+.+.}-{0:0}, at: process_one_work+0x8be/0x1630 kernel/workqueue.c:3294 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #7 ((work_completion)(&worker->work)){+.+.}-{0:0}: process_one_work+0x8d7/0x1630 kernel/workqueue.c:3294 process_scheduled_works kernel/workqueue.c:3401 [inline] worker_thread+0xb49/0x1140 kernel/workqueue.c:3482 kthread+0x388/0x470 kernel/kthread.c:436 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 -> #6 ((wq_completion)loop4){+.+.}-{0:0}: touch_wq_lockdep_map+0xcb/0x180 kernel/workqueue.c:4033 __flush_workqueue+0x14b/0x14f0 kernel/workqueue.c:4075 drain_workqueue+0xd3/0x390 kernel/workqueue.c:4239 __loop_clr_fd drivers/block/loop.c:1130 [inline] lo_release+0x287/0x8f0 drivers/block/loop.c:1767 bdev_release+0x541/0x660 block/bdev.c:-1 blkdev_release+0x15/0x20 block/fops.c:705 __fput+0x461/0xa70 fs/file_table.c:510 fput_close_sync+0x11f/0x240 fs/file_table.c:615 __do_sys_close fs/open.c:1511 [inline] __se_sys_close fs/open.c:1496 [inline] __x64_sys_close+0x7e/0x110 fs/open.c:1496 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x15f/0x560 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #5 (&disk->open_mutex){+.+.}-{4:4}: __mutex_lock_common kernel/locking/rtmutex_api.c:559 [inline] mutex_lock_nested+0x5a/0x1d0 kernel/locking/rtmutex_api.c:578 __del_gendisk+0x127/0x980 block/genhd.c:710 del_gendisk+0xe7/0x160 block/genhd.c:823 nbd_dev_remove drivers/block/nbd.c:268 [inline] nbd_dev_remove_work+0x47/0xe0 drivers/block/nbd.c:284 process_one_work+0x98b/0x1630 kernel/workqueue.c:3318 process_scheduled_works kernel/workqueue.c:3401 [inline] worker_thread+0xb49/0x1140 kernel/workqueue.c:3482 kthread+0x388/0x470 kernel/kthread.c:436 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 -> #4 (&set->update_nr_hwq_lock){++++}-{4:4}: down_read+0x97/0x200 kernel/locking/rwsem.c:1568 add_disk_fwnode+0xe7/0x480 block/genhd.c:596 add_disk include/linux/blkdev.h:794 [inline] nbd_dev_add+0x72c/0xb50 drivers/block/nbd.c:1984 nbd_genl_connect+0x965/0x1c80 drivers/block/nbd.c:2125 genl_family_rcv_msg_doit+0x22a/0x330 net/netlink/genetlink.c:1114 genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline] genl_rcv_msg+0x61c/0x7a0 net/netlink/genetlink.c:1209 netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2551 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x780/0x920 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1895 sock_sendmsg_nosec+0x112/0x150 net/socket.c:797 __sock_sendmsg net/socket.c:812 [inline] ____sys_sendmsg+0x55c/0x870 net/socket.c:2716 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2770 __sys_sendmsg net/socket.c:2802 [inline] __do_sys_sendmsg net/socket.c:2807 [inline] __se_sys_sendmsg net/socket.c:2805 [inline] __x64_sys_sendmsg+0x1c3/0x2a0 net/socket.c:2805 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x15f/0x560 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #3 (genl_mutex){+.+.}-{4:4}: __mutex_lock_common kernel/locking/rtmutex_api.c:559 [inline] mutex_lock_nested+0x5a/0x1d0 kernel/locking/rtmutex_api.c:578 genl_lock net/netlink/genetlink.c:35 [inline] genl_lock_all net/netlink/genetlink.c:48 [inline] genl_register_family+0x7b9/0x17b0 net/netlink/genetlink.c:784 vdpa_init+0x39/0x70 drivers/vdpa/vdpa.c:1565 do_one_initcall+0x250/0x870 init/main.c:1347 do_initcall_level+0x104/0x190 init/main.c:1409 do_initcalls+0x59/0xa0 init/main.c:1425 kernel_init_freeable+0x2a6/0x3e0 init/main.c:1658 kernel_init+0x1d/0x1d0 init/main.c:1548 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 -> #2 (cb_lock){++++}-{4:4}: down_read+0x97/0x200 kernel/locking/rwsem.c:1568 genl_rcv+0x19/0x40 net/netlink/genetlink.c:1217 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x780/0x920 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1895 sock_sendmsg_nosec+0x112/0x150 net/socket.c:797 __sock_sendmsg net/socket.c:812 [inline] sock_sendmsg+0x1ca/0x2d0 net/socket.c:835 splice_to_socket+0xae5/0x11f0 fs/splice.c:884 do_splice_from fs/splice.c:936 [inline] do_splice+0xef8/0x1940 fs/splice.c:1349 __do_splice fs/splice.c:1431 [inline] __do_sys_splice fs/splice.c:1634 [inline] __se_sys_splice+0x353/0x490 fs/splice.c:1616 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x15f/0x560 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #1 (&pipe->mutex){+.+.}-{4:4}: __mutex_lock_common kernel/locking/rtmutex_api.c:559 [inline] mutex_lock_nested+0x5a/0x1d0 kernel/locking/rtmutex_api.c:578 iter_file_splice_write+0x1f3/0x10f0 fs/splice.c:682 do_splice_from fs/splice.c:936 [inline] do_splice+0xef8/0x1940 fs/splice.c:1349 __do_splice fs/splice.c:1431 [inline] __do_sys_splice fs/splice.c:1634 [inline] __se_sys_splice+0x353/0x490 fs/splice.c:1616 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x15f/0x560 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (sb_writers#5){.+.+}-{0:0}: check_prev_add kernel/locking/lockdep.c:3167 [inline] check_prevs_add kernel/locking/lockdep.c:3286 [inline] validate_chain kernel/locking/lockdep.c:3910 [inline] __lock_acquire+0x15a5/0x2d10 kernel/locking/lockdep.c:5239 lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5870 percpu_down_read_internal include/linux/percpu-rwsem.h:53 [inline] percpu_down_read_freezable include/linux/percpu-rwsem.h:83 [inline] __sb_start_write include/linux/fs/super.h:19 [inline] sb_start_write include/linux/fs/super.h:125 [inline] kiocb_start_write include/linux/fs.h:2767 [inline] lo_rw_aio+0xb1b/0xf00 drivers/block/loop.c:401 do_req_filebacked drivers/block/loop.c:433 [inline] loop_handle_cmd drivers/block/loop.c:1941 [inline] loop_process_work+0x637/0x11b0 drivers/block/loop.c:1976 process_one_work+0x98b/0x1630 kernel/workqueue.c:3318 process_scheduled_works kernel/workqueue.c:3401 [inline] worker_thread+0xb49/0x1140 kernel/workqueue.c:3482 kthread+0x388/0x470 kernel/kthread.c:436 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 other info that might help us debug this: Chain exists of: sb_writers#5 --> (wq_completion)loop4 --> (work_completion)(&worker->work) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&worker->work)); lock((wq_completion)loop4); lock((work_completion)(&worker->work)); rlock(sb_writers#5); *** DEADLOCK *** ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-05-29 22:05 ` Hillf Danton @ 2026-05-30 23:57 ` Tetsuo Handa 2026-06-07 10:54 ` [PATCH v4] " Tetsuo Handa 0 siblings, 1 reply; 21+ messages in thread From: Tetsuo Handa @ 2026-05-30 23:57 UTC (permalink / raw) To: Hillf Danton, Jens Axboe Cc: Bart Van Assche, Christoph Hellwig, Damien Le Moal, Ming Lei, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner On 2026/05/30 7:05, Hillf Danton wrote: >>> Therefore, I think we can address this problem by "drain_workqueue() with disk->open_mutex >>> held" in the loop driver side. >>> >> Good news. >> > Bad news: Subject: [syzbot] [block?] possible deadlock in loop_process_work > [3] https://lore.kernel.org/lkml/6a19f5f7.5099cdd9.8e407.0004.GAE@google.com/ > OK. I sent two patches https://lkml.kernel.org/r/147ed056-03d9-4214-b925-0f10fc00cf27@I-love.SAKURA.ne.jp https://lkml.kernel.org/r/148efba2-a0b6-47d7-ac76-b19d2f4b696c@I-love.SAKURA.ne.jp as a preparation for evaluating the possibility of calling drain_workqueue() from __loop_clr_fd(). But as far as syzbot has tested using linux-next tree https://syzkaller.appspot.com/bug?extid=c4e9d077bcc86bee08dc https://syzkaller.appspot.com/bug?extid=4feabfc9641267769c97 seems to remain even if we applied above patches. Therefore, I think that we need to call drain_workqueue() from __loop_clr_fd() without holding disk->open_mutex (if we address this NULL pointer dereference problem by updating the loop driver). "[PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()" was an attempt to call drain_workqueue() from __loop_clr_fd() without holding disk->open_mutex, but Sashiko's review ( https://sashiko.dev/#/patchset/fda8abc8-6aa2-463b-bf72-865f6b838034%40I-love.SAKURA.ne.jp ) mentioned that the "module_put(THIS_MODULE);" executed as the last step of __loop_clr_fd() has a race window of concurrently triggering module unload operation because module refcount of the loop driver can become 0 due to this module_put(THIS_MODULE) call. In other words, we cannot safely manage refcount of the loop module without a support by the caller of lo_release() (i.e. bdev_release()). void bdev_release(struct file *bdev_file) { (...snipped...) if (bdev_is_partition(bdev)) blkdev_put_part(bdev); else blkdev_put_whole(bdev); mutex_unlock(&disk->open_mutex); // <= Keeping holding disk->open_mutex until __loop_clr_fd() completes causes circular locking problem. module_put(disk->fops->owner); // <= Calling after __loop_clr_fd() completed is required for managing module refcount safely. put_no_open: blkdev_put_no_open(bdev); } Therefore, I think that the only robust and safe approach is, although you won't be happy to see layering violation / tricky code, either (a) allow __loop_clr_fd() to temporarily drop disk->open_mutex or (b) add a new callback for the loop driver which is called between mutex_unlock(&disk->open_mutex) and module_put(disk->fops->owner) . Jens, what do you think? One might argue that this problem should be fixed on the filesystem side by ensuring all filesystems wait for I/O requests safely. However, from the perspective of defensive programming, the loop driver should be robust enough to handle incomplete I/O serialization from underlying layers to prevent GPF. Furthermore, without adding noisy debug printk() messages, it is extremely difficult to pinpoint which specific layer or filesystem failed to wait for the I/O requests. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v4] loop: Fix NULL pointer dereference in lo_rw_aio() 2026-05-30 23:57 ` Tetsuo Handa @ 2026-06-07 10:54 ` Tetsuo Handa 0 siblings, 0 replies; 21+ messages in thread From: Tetsuo Handa @ 2026-06-07 10:54 UTC (permalink / raw) To: Jens Axboe Cc: Bart Van Assche, Christoph Hellwig, Damien Le Moal, Ming Lei, linux-block, LKML, Andrew Morton, Linus Torvalds, linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner, Hillf Danton syzbot is reporting NULL pointer dereference in lo_rw_aio() [1][2]. An analysis by the Gemini AI collaborator [3] considers that this problem is caused by a timing shift primarily exposed by commit 65565ca5f99b ("block: unify the synchronous bi_end_io callbacks"), along with helper refactorings like commit 92c3737a2473 ("block: add a bio_submit_or_kill helper"). But due to difficulty of reproducing this race, discussion about what is happening and how to fix this problem is stalling. Also, we haven't identified how many filesystems are subjected to this problem. Therefore, this patch introduces a grace period for flushing pending I/O requests (which should be a good thing from the perspective of defensive programming) so that we won't hit NULL pointer dereference problem, and also emits BUG: message in order to help filesystem developers identify the caller of an I/O request that failed to wait for completion so that filesystem developers can fix such caller to wait for completion. Note that emitting BUG: message is enabled only if CONFIG_KCOV=y, for this check is a waste of computation resources for almost all users. Link: https://syzkaller.appspot.com/bug?extid=cd8a9a308e879a4e2c28 [1] Link: https://syzkaller.appspot.com/bug?extid=bc273027d5643e48e5b3 [2] Link: https://lkml.kernel.org/r/fbb3edda-f108-4e5b-acf2-266f043f8125@I-love.SAKURA.ne.jp [3] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- drivers/block/loop.c | 82 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 80 insertions(+), 2 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 0000913f7efc..4ff254d8b623 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -85,8 +85,26 @@ struct loop_cmd { struct bio_vec *bvec; struct cgroup_subsys_state *blkcg_css; struct cgroup_subsys_state *memcg_css; +#ifdef CONFIG_KCOV + unsigned long stack_entries[30]; + int stack_nr; + pid_t pid; + char comm[TASK_COMM_LEN]; +#endif }; +static void loop_check_io_race(struct loop_device *lo, struct loop_cmd *cmd) +{ +#ifdef CONFIG_KCOV + if (unlikely(data_race(READ_ONCE(lo->lo_state)) == Lo_rundown)) { + pr_err("BUG: %s/%u is doing I/O request on loop%d in Lo_rundown state.\n", + cmd->comm, cmd->pid, lo->lo_number); + printk("Call trace:\n"); + stack_trace_print(cmd->stack_entries, cmd->stack_nr, 4); + } +#endif +} + #define LOOP_IDLE_WORKER_TIMEOUT (60 * HZ) #define LOOP_DEFAULT_HW_Q_DEPTH 128 @@ -1747,8 +1765,59 @@ static void lo_release(struct gendisk *disk) need_clear = (lo->lo_state == Lo_rundown); mutex_unlock(&lo->lo_mutex); - if (need_clear) + if (need_clear) { + /* + * Temporarily release disk->open_mutex in order to flush pending I/O + * requests before clearing the backing device. + * + * This is a layering violation. But since bdev->bd_disk->fops->release() + * (which is mapped to lo_release()) is the final function which + * blkdev_put_whole() from bdev_release() calls immediately before + * releasing disk->open_mutex, this changes nothing except opens a new + * race window for allowing disk->fops->open() (which is mapped to + * lo_open()) to be called. + * + * Even if lo_open() is called from blkdev_get_whole() due to this race, + * the Lo_rundown state guarantees that lo_open() will fail with -ENXIO. + * Thus, there will be effectively no change caused by this violation. + */ + mutex_unlock(&lo->lo_disk->open_mutex); + /* + * Now that loop_queue_rq() sees lo->lo_state != Lo_bound, + * wait for already started loop_queue_rq() to complete. + */ + synchronize_rcu(); + /* + * Now that no more works are scheduled by loop_queue_rq(), + * wait for already scheduled works to complete. + */ + drain_workqueue(lo->workqueue); + /* + * Now that no more AIO requests are scheduled by lo_rw_aio(), + * wait for already started AIO to complete. + * + * Due to synchronize_rcu() + drain_workqueue() sequence above, + * calling blk_mq_unfreeze_queue() immediately after blk_mq_freeze_queue() + * returns has to be safe, for loop_queue_rq() no longer schedules new + * lo_rw_aio() works and lo_rw_aio() no longer submits new AIO requests. + * + * Deferring blk_mq_unfreeze_queue() does not help because we are about + * to clear the backing device and drop the refcount for the backing device. + * There is nothing we can do if blk_mq_freeze_queue() fails to flush. + */ + blk_mq_unfreeze_queue(lo->lo_queue, blk_mq_freeze_queue(lo->lo_queue)); + /* + * Perform remaining cleanup, with disk->open_mutex held. + * + * The lo->lo_state should remain Lo_rundown despite we temporarily + * released disk->open_mutex, for I am the only and the last user of + * this loop device because lo_open() cannot succeed. + */ + mutex_lock(&lo->lo_disk->open_mutex); + if (WARN_ON(data_race(READ_ONCE(lo->lo_state)) != Lo_rundown)) + return; __loop_clr_fd(lo); + } } static void lo_free_disk(struct gendisk *disk) @@ -1855,10 +1924,18 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx, struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq); struct loop_device *lo = rq->q->queuedata; +#ifdef CONFIG_KCOV + cmd->stack_nr = stack_trace_save(cmd->stack_entries, ARRAY_SIZE(cmd->stack_entries), 0); + cmd->pid = current->pid; + get_task_comm(cmd->comm, current); +#endif + blk_mq_start_request(rq); - if (data_race(READ_ONCE(lo->lo_state)) != Lo_bound) + if (data_race(READ_ONCE(lo->lo_state)) != Lo_bound) { + loop_check_io_race(lo, cmd); return BLK_STS_IOERR; + } switch (req_op(rq)) { case REQ_OP_FLUSH: @@ -1901,6 +1978,7 @@ static void loop_handle_cmd(struct loop_cmd *cmd) int ret = 0; struct mem_cgroup *old_memcg = NULL; + loop_check_io_race(lo, cmd); if (write && (lo->lo_flags & LO_FLAGS_READ_ONLY)) { ret = -EIO; goto failed; -- 2.47.3 ^ permalink raw reply related [flat|nested] 21+ messages in thread
end of thread, other threads:[~2026-06-07 10:55 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <ag0lS_CbKO9R5CV8@fedora>
[not found] ` <94076bc9-2c09-4bb6-8468-b6b8af419cb9@I-love.SAKURA.ne.jp>
[not found] ` <ag1nfIFcykmQHbkk@fedora>
[not found] ` <1ab8c579-eb76-4227-8a72-6ec819135219@I-love.SAKURA.ne.jp>
[not found] ` <ag1223nAa0wZ8ALC@fedora>
[not found] ` <fda8abc8-6aa2-463b-bf72-865f6b838034@I-love.SAKURA.ne.jp>
[not found] ` <ahRocb0Vs_m6RF_O@fedora>
[not found] ` <1a9f53d4-6f48-4df8-a3d8-2b0e442a163a@I-love.SAKURA.ne.jp>
[not found] ` <ahZGxoI6oHQ_vSrx@fedora>
[not found] ` <d1b5a737-f0e3-4927-b762-430b37fbb2f9@I-love.SAKURA.ne.jp>
2026-05-27 3:00 ` [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio() Ming Lei
2026-05-27 11:29 ` Tetsuo Handa
2026-05-27 18:11 ` Damien Le Moal
2026-05-28 8:38 ` Christoph Hellwig
2026-05-28 10:16 ` Qu Wenruo
2026-06-01 14:40 ` Christoph Hellwig
2026-06-01 16:29 ` Brian Foster
2026-06-01 22:27 ` Qu Wenruo
2026-06-01 15:29 ` Ming Lei
2026-06-01 21:51 ` Hillf Danton
2026-06-01 22:14 ` Ming Lei
2026-06-01 23:17 ` Hillf Danton
2026-06-01 23:36 ` Ming Lei
2026-06-02 2:02 ` Hillf Danton
2026-05-28 5:43 ` Hillf Danton
2026-05-28 23:00 ` Hillf Danton
2026-05-29 0:14 ` Tetsuo Handa
2026-05-29 7:04 ` Hillf Danton
2026-05-29 22:05 ` Hillf Danton
2026-05-30 23:57 ` Tetsuo Handa
2026-06-07 10:54 ` [PATCH v4] " Tetsuo Handa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox