* [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode @ 2026-01-28 7:45 Gerald Yang 2026-01-28 10:22 ` Jan Kara 0 siblings, 1 reply; 12+ messages in thread From: Gerald Yang @ 2026-01-28 7:45 UTC (permalink / raw) To: tytso, adilger.kernel, jack; +Cc: linux-ext4, gerald.yang.tw, Gerald Yang When remounting the filesystem to read only in data=journal mode it may dump the following call trace: [ 71.629350] CPU: 0 UID: 0 PID: 177 Comm: kworker/u96:5 Tainted: G E 6.19.0-rc7 #1 PREEMPT(voluntary) [ 71.629352] Tainted: [E]=UNSIGNED_MODULE [ 71.629353] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS unknown 2/2/2022 [ 71.629354] Workqueue: writeback wb_workfn (flush-7:4) [ 71.629359] RIP: 0010:ext4_journal_check_start+0x8b/0xd0 [ 71.629360] Code: 31 ff 45 31 c0 45 31 c9 e9 42 ad c4 00 48 8b 5d f8 b8 fb ff ff ff c9 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc cc cc cc <0f> 0b b8 e2 ff ff ff eb c2 0f 0b eb a9 44 8b 42 08 68 c7 53 ce b8 [ 71.629361] RSP: 0018:ffffcf32c0fdf6a8 EFLAGS: 00010202 [ 71.629364] RAX: ffff8f08c8505000 RBX: ffff8f08c67ee800 RCX: 0000000000000000 [ 71.629366] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 71.629367] RBP: ffffcf32c0fdf6b0 R08: 0000000000000001 R09: 0000000000000000 [ 71.629368] R10: ffff8f08db18b3a8 R11: 0000000000000000 R12: 0000000000000000 [ 71.629368] R13: 0000000000000002 R14: 0000000000000a48 R15: ffff8f08c67ee800 [ 71.629369] FS: 0000000000000000(0000) GS:ffff8f0a7d273000(0000) knlGS:0000000000000000 [ 71.629370] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 71.629371] CR2: 00007b66825905cc CR3: 000000011053d004 CR4: 0000000000772ef0 [ 71.629374] PKRU: 55555554 [ 71.629374] Call Trace: [ 71.629378] <TASK> [ 71.629382] __ext4_journal_start_sb+0x38/0x1c0 [ 71.629383] mpage_prepare_extent_to_map+0x4af/0x580 [ 71.629389] ? sbitmap_get+0x73/0x180 [ 71.629399] ext4_do_writepages+0x3cc/0x10a0 [ 71.629400] ? kvm_sched_clock_read+0x11/0x20 [ 71.629409] ext4_writepages+0xc8/0x1b0 [ 71.629410] ? ext4_writepages+0xc8/0x1b0 [ 71.629411] do_writepages+0xc4/0x180 [ 71.629416] __writeback_single_inode+0x45/0x350 [ 71.629419] ? _raw_spin_unlock+0xe/0x40 [ 71.629423] writeback_sb_inodes+0x260/0x5c0 [ 71.629425] ? __schedule+0x4d1/0x1870 [ 71.629429] __writeback_inodes_wb+0x54/0x100 [ 71.629431] ? queue_io+0x82/0x140 [ 71.629433] wb_writeback+0x1ab/0x330 [ 71.629448] wb_workfn+0x31d/0x410 [ 71.629450] process_one_work+0x191/0x3e0 [ 71.629455] worker_thread+0x2e3/0x420 This issue can be easily reproduced by: mkdir -p mnt dd if=/dev/zero of=ext4disk bs=1G count=2 oflag=direct mkfs.ext4 ext4disk tune2fs -o journal_data ext4disk mount ext4disk mnt fio --name=fiotest --rw=randwrite --bs=4k --runtime=3 --ioengine=libaio --iodepth=128 --numjobs=4 --filename=mnt/fiotest --filesize=1G --group_reporting mount -o remount,ro ext4disk mnt sync In data=journal mode, metadata and data are both written to the journal first, but for the second write, ext4 relies on the writeback thread to flush the data to the real file location. After the filesystem is remounted to read only, writeback thread still writes data to it and causes the issue. Return early to avoid starting a journal transaction on a read only filesystem, once the filesystem becomes writable again, the write thread will continue writing data. Signed-off-by: Gerald Yang <gerald.yang@canonical.com> --- fs/ext4/inode.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 15ba4d42982f..4e3bbf17995e 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2787,6 +2787,17 @@ static int ext4_do_writepages(struct mpage_da_data *mpd) if (unlikely(ret)) goto out_writepages; + /* + * For data=journal, if the filesystem was remounted read-only, + * the writeback thread may still write dirty pages to it. + * Return early to avoid starting a journal transaction on a + * read-only filesystem. + */ + if (ext4_should_journal_data(inode) && sb_rdonly(inode->i_sb)) { + ret = -EROFS; + goto out_writepages; + } + /* * If we have inline data and arrive here, it means that * we will soon create the block for the 1st page, so -- 2.43.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode 2026-01-28 7:45 [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode Gerald Yang @ 2026-01-28 10:22 ` Jan Kara 2026-01-29 3:31 ` Gerald Yang 0 siblings, 1 reply; 12+ messages in thread From: Jan Kara @ 2026-01-28 10:22 UTC (permalink / raw) To: Gerald Yang; +Cc: tytso, adilger.kernel, jack, linux-ext4, gerald.yang.tw On Wed 28-01-26 15:45:15, Gerald Yang wrote: > When remounting the filesystem to read only in data=journal mode > it may dump the following call trace: > > [ 71.629350] CPU: 0 UID: 0 PID: 177 Comm: kworker/u96:5 Tainted: G E 6.19.0-rc7 #1 PREEMPT(voluntary) > [ 71.629352] Tainted: [E]=UNSIGNED_MODULE > [ 71.629353] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS unknown 2/2/2022 > [ 71.629354] Workqueue: writeback wb_workfn (flush-7:4) > [ 71.629359] RIP: 0010:ext4_journal_check_start+0x8b/0xd0 > [ 71.629360] Code: 31 ff 45 31 c0 45 31 c9 e9 42 ad c4 00 48 8b 5d f8 b8 fb ff ff ff c9 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc cc cc cc <0f> 0b b8 e2 ff ff ff eb c2 0f 0b eb > a9 44 8b 42 08 68 c7 53 ce b8 > [ 71.629361] RSP: 0018:ffffcf32c0fdf6a8 EFLAGS: 00010202 > [ 71.629364] RAX: ffff8f08c8505000 RBX: ffff8f08c67ee800 RCX: 0000000000000000 > [ 71.629366] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > [ 71.629367] RBP: ffffcf32c0fdf6b0 R08: 0000000000000001 R09: 0000000000000000 > [ 71.629368] R10: ffff8f08db18b3a8 R11: 0000000000000000 R12: 0000000000000000 > [ 71.629368] R13: 0000000000000002 R14: 0000000000000a48 R15: ffff8f08c67ee800 > [ 71.629369] FS: 0000000000000000(0000) GS:ffff8f0a7d273000(0000) knlGS:0000000000000000 > [ 71.629370] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 71.629371] CR2: 00007b66825905cc CR3: 000000011053d004 CR4: 0000000000772ef0 > [ 71.629374] PKRU: 55555554 > [ 71.629374] Call Trace: > [ 71.629378] <TASK> > [ 71.629382] __ext4_journal_start_sb+0x38/0x1c0 > [ 71.629383] mpage_prepare_extent_to_map+0x4af/0x580 > [ 71.629389] ? sbitmap_get+0x73/0x180 > [ 71.629399] ext4_do_writepages+0x3cc/0x10a0 > [ 71.629400] ? kvm_sched_clock_read+0x11/0x20 > [ 71.629409] ext4_writepages+0xc8/0x1b0 > [ 71.629410] ? ext4_writepages+0xc8/0x1b0 > [ 71.629411] do_writepages+0xc4/0x180 > [ 71.629416] __writeback_single_inode+0x45/0x350 > [ 71.629419] ? _raw_spin_unlock+0xe/0x40 > [ 71.629423] writeback_sb_inodes+0x260/0x5c0 > [ 71.629425] ? __schedule+0x4d1/0x1870 > [ 71.629429] __writeback_inodes_wb+0x54/0x100 > [ 71.629431] ? queue_io+0x82/0x140 > [ 71.629433] wb_writeback+0x1ab/0x330 > [ 71.629448] wb_workfn+0x31d/0x410 > [ 71.629450] process_one_work+0x191/0x3e0 > [ 71.629455] worker_thread+0x2e3/0x420 > > This issue can be easily reproduced by: > mkdir -p mnt > dd if=/dev/zero of=ext4disk bs=1G count=2 oflag=direct > mkfs.ext4 ext4disk > tune2fs -o journal_data ext4disk > mount ext4disk mnt > fio --name=fiotest --rw=randwrite --bs=4k --runtime=3 --ioengine=libaio --iodepth=128 --numjobs=4 --filename=mnt/fiotest --filesize=1G --group_reporting > mount -o remount,ro ext4disk mnt > sync > > In data=journal mode, metadata and data are both written to the journal > first, but for the second write, ext4 relies on the writeback thread to > flush the data to the real file location. > > After the filesystem is remounted to read only, writeback thread still > writes data to it and causes the issue. Return early to avoid starting > a journal transaction on a read only filesystem, once the filesystem > becomes writable again, the write thread will continue writing data. > > Signed-off-by: Gerald Yang <gerald.yang@canonical.com> Thanks for the report and the patch! I can indeed reproduce this warning. But the patch itself is certainly not the right fix for this problem. ext4_remount() must make sure there are no dirty pages on the filesystem anymore when remounting filesystem read only and it apparently fails to do so. In particular it calls sync_filesystem() which should make sure all data is written. So this bug needs more investigation why there are some dirty pages left in the inode in data=journal mode because ext4_writepages() should have written them all... Honza > --- > fs/ext4/inode.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 15ba4d42982f..4e3bbf17995e 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -2787,6 +2787,17 @@ static int ext4_do_writepages(struct mpage_da_data *mpd) > if (unlikely(ret)) > goto out_writepages; > > + /* > + * For data=journal, if the filesystem was remounted read-only, > + * the writeback thread may still write dirty pages to it. > + * Return early to avoid starting a journal transaction on a > + * read-only filesystem. > + */ > + if (ext4_should_journal_data(inode) && sb_rdonly(inode->i_sb)) { > + ret = -EROFS; > + goto out_writepages; > + } > + > /* > * If we have inline data and arrive here, it means that > * we will soon create the block for the 1st page, so > -- > 2.43.0 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode 2026-01-28 10:22 ` Jan Kara @ 2026-01-29 3:31 ` Gerald Yang 2026-01-29 9:31 ` Jan Kara 0 siblings, 1 reply; 12+ messages in thread From: Gerald Yang @ 2026-01-29 3:31 UTC (permalink / raw) To: Jan Kara; +Cc: tytso, adilger.kernel, linux-ext4, gerald.yang.tw Thanks Jan for the review, originally this issue was observed during reboot because the root filesystem is remounted to read only before shutdown to make sure all data is flushed to disk. We don't see any issue on the machine because the data is persisted to journal. But I think your suggestion is the correct way to fix it, I will look into why ext4_writepages doesn't flush data to real file location after calling sync_filesystem and re-submit the patch for review, thanks again. On Wed, Jan 28, 2026 at 6:22 PM Jan Kara <jack@suse.cz> wrote: > > On Wed 28-01-26 15:45:15, Gerald Yang wrote: > > When remounting the filesystem to read only in data=journal mode > > it may dump the following call trace: > > > > [ 71.629350] CPU: 0 UID: 0 PID: 177 Comm: kworker/u96:5 Tainted: G E 6.19.0-rc7 #1 PREEMPT(voluntary) > > [ 71.629352] Tainted: [E]=UNSIGNED_MODULE > > [ 71.629353] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS unknown 2/2/2022 > > [ 71.629354] Workqueue: writeback wb_workfn (flush-7:4) > > [ 71.629359] RIP: 0010:ext4_journal_check_start+0x8b/0xd0 > > [ 71.629360] Code: 31 ff 45 31 c0 45 31 c9 e9 42 ad c4 00 48 8b 5d f8 b8 fb ff ff ff c9 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc cc cc cc <0f> 0b b8 e2 ff ff ff eb c2 0f 0b eb > > a9 44 8b 42 08 68 c7 53 ce b8 > > [ 71.629361] RSP: 0018:ffffcf32c0fdf6a8 EFLAGS: 00010202 > > [ 71.629364] RAX: ffff8f08c8505000 RBX: ffff8f08c67ee800 RCX: 0000000000000000 > > [ 71.629366] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > [ 71.629367] RBP: ffffcf32c0fdf6b0 R08: 0000000000000001 R09: 0000000000000000 > > [ 71.629368] R10: ffff8f08db18b3a8 R11: 0000000000000000 R12: 0000000000000000 > > [ 71.629368] R13: 0000000000000002 R14: 0000000000000a48 R15: ffff8f08c67ee800 > > [ 71.629369] FS: 0000000000000000(0000) GS:ffff8f0a7d273000(0000) knlGS:0000000000000000 > > [ 71.629370] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 71.629371] CR2: 00007b66825905cc CR3: 000000011053d004 CR4: 0000000000772ef0 > > [ 71.629374] PKRU: 55555554 > > [ 71.629374] Call Trace: > > [ 71.629378] <TASK> > > [ 71.629382] __ext4_journal_start_sb+0x38/0x1c0 > > [ 71.629383] mpage_prepare_extent_to_map+0x4af/0x580 > > [ 71.629389] ? sbitmap_get+0x73/0x180 > > [ 71.629399] ext4_do_writepages+0x3cc/0x10a0 > > [ 71.629400] ? kvm_sched_clock_read+0x11/0x20 > > [ 71.629409] ext4_writepages+0xc8/0x1b0 > > [ 71.629410] ? ext4_writepages+0xc8/0x1b0 > > [ 71.629411] do_writepages+0xc4/0x180 > > [ 71.629416] __writeback_single_inode+0x45/0x350 > > [ 71.629419] ? _raw_spin_unlock+0xe/0x40 > > [ 71.629423] writeback_sb_inodes+0x260/0x5c0 > > [ 71.629425] ? __schedule+0x4d1/0x1870 > > [ 71.629429] __writeback_inodes_wb+0x54/0x100 > > [ 71.629431] ? queue_io+0x82/0x140 > > [ 71.629433] wb_writeback+0x1ab/0x330 > > [ 71.629448] wb_workfn+0x31d/0x410 > > [ 71.629450] process_one_work+0x191/0x3e0 > > [ 71.629455] worker_thread+0x2e3/0x420 > > > > This issue can be easily reproduced by: > > mkdir -p mnt > > dd if=/dev/zero of=ext4disk bs=1G count=2 oflag=direct > > mkfs.ext4 ext4disk > > tune2fs -o journal_data ext4disk > > mount ext4disk mnt > > fio --name=fiotest --rw=randwrite --bs=4k --runtime=3 --ioengine=libaio --iodepth=128 --numjobs=4 --filename=mnt/fiotest --filesize=1G --group_reporting > > mount -o remount,ro ext4disk mnt > > sync > > > > In data=journal mode, metadata and data are both written to the journal > > first, but for the second write, ext4 relies on the writeback thread to > > flush the data to the real file location. > > > > After the filesystem is remounted to read only, writeback thread still > > writes data to it and causes the issue. Return early to avoid starting > > a journal transaction on a read only filesystem, once the filesystem > > becomes writable again, the write thread will continue writing data. > > > > Signed-off-by: Gerald Yang <gerald.yang@canonical.com> > > Thanks for the report and the patch! I can indeed reproduce this warning. > But the patch itself is certainly not the right fix for this problem. > ext4_remount() must make sure there are no dirty pages on the filesystem > anymore when remounting filesystem read only and it apparently fails to do > so. In particular it calls sync_filesystem() which should make sure all > data is written. So this bug needs more investigation why there are some > dirty pages left in the inode in data=journal mode because > ext4_writepages() should have written them all... > > Honza > > > --- > > fs/ext4/inode.c | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > index 15ba4d42982f..4e3bbf17995e 100644 > > --- a/fs/ext4/inode.c > > +++ b/fs/ext4/inode.c > > @@ -2787,6 +2787,17 @@ static int ext4_do_writepages(struct mpage_da_data *mpd) > > if (unlikely(ret)) > > goto out_writepages; > > > > + /* > > + * For data=journal, if the filesystem was remounted read-only, > > + * the writeback thread may still write dirty pages to it. > > + * Return early to avoid starting a journal transaction on a > > + * read-only filesystem. > > + */ > > + if (ext4_should_journal_data(inode) && sb_rdonly(inode->i_sb)) { > > + ret = -EROFS; > > + goto out_writepages; > > + } > > + > > /* > > * If we have inline data and arrive here, it means that > > * we will soon create the block for the 1st page, so > > -- > > 2.43.0 > > > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode 2026-01-29 3:31 ` Gerald Yang @ 2026-01-29 9:31 ` Jan Kara 2026-01-30 11:38 ` Gerald Yang 0 siblings, 1 reply; 12+ messages in thread From: Jan Kara @ 2026-01-29 9:31 UTC (permalink / raw) To: Gerald Yang; +Cc: Jan Kara, tytso, adilger.kernel, linux-ext4, gerald.yang.tw On Thu 29-01-26 11:31:43, Gerald Yang wrote: > Thanks Jan for the review, originally this issue was observed during reboot > because the root filesystem is remounted to read only before shutdown to > make sure all data is flushed to disk. > We don't see any issue on the machine because the data is persisted to > journal. But I think your suggestion is the correct way to fix it, I > will look into > why ext4_writepages doesn't flush data to real file location after calling > sync_filesystem and re-submit the patch for review, thanks again. FWIW yesterday I did some investigation and it is always the tail (last written) folio that is somehow kept dirty. In particular at the beginning for ext4_do_writepages() we commit the running transaction and the bh attached to the folio is just dirty but by the time we get to ext4_bio_write_folio() to write it, the bh attached to the tail folio is already part of the running transaction again and so ext4_bio_write_folio() fails to write it. I didn't figure out how the bh gets reattached to the transaction yet. Now I likely won't be able to dig more into this for a few days so I'm just sharing my findings until now. Honza > On Wed, Jan 28, 2026 at 6:22 PM Jan Kara <jack@suse.cz> wrote: > > > > On Wed 28-01-26 15:45:15, Gerald Yang wrote: > > > When remounting the filesystem to read only in data=journal mode > > > it may dump the following call trace: > > > > > > [ 71.629350] CPU: 0 UID: 0 PID: 177 Comm: kworker/u96:5 Tainted: G E 6.19.0-rc7 #1 PREEMPT(voluntary) > > > [ 71.629352] Tainted: [E]=UNSIGNED_MODULE > > > [ 71.629353] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS unknown 2/2/2022 > > > [ 71.629354] Workqueue: writeback wb_workfn (flush-7:4) > > > [ 71.629359] RIP: 0010:ext4_journal_check_start+0x8b/0xd0 > > > [ 71.629360] Code: 31 ff 45 31 c0 45 31 c9 e9 42 ad c4 00 48 8b 5d f8 b8 fb ff ff ff c9 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc cc cc cc <0f> 0b b8 e2 ff ff ff eb c2 0f 0b eb > > > a9 44 8b 42 08 68 c7 53 ce b8 > > > [ 71.629361] RSP: 0018:ffffcf32c0fdf6a8 EFLAGS: 00010202 > > > [ 71.629364] RAX: ffff8f08c8505000 RBX: ffff8f08c67ee800 RCX: 0000000000000000 > > > [ 71.629366] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > [ 71.629367] RBP: ffffcf32c0fdf6b0 R08: 0000000000000001 R09: 0000000000000000 > > > [ 71.629368] R10: ffff8f08db18b3a8 R11: 0000000000000000 R12: 0000000000000000 > > > [ 71.629368] R13: 0000000000000002 R14: 0000000000000a48 R15: ffff8f08c67ee800 > > > [ 71.629369] FS: 0000000000000000(0000) GS:ffff8f0a7d273000(0000) knlGS:0000000000000000 > > > [ 71.629370] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 71.629371] CR2: 00007b66825905cc CR3: 000000011053d004 CR4: 0000000000772ef0 > > > [ 71.629374] PKRU: 55555554 > > > [ 71.629374] Call Trace: > > > [ 71.629378] <TASK> > > > [ 71.629382] __ext4_journal_start_sb+0x38/0x1c0 > > > [ 71.629383] mpage_prepare_extent_to_map+0x4af/0x580 > > > [ 71.629389] ? sbitmap_get+0x73/0x180 > > > [ 71.629399] ext4_do_writepages+0x3cc/0x10a0 > > > [ 71.629400] ? kvm_sched_clock_read+0x11/0x20 > > > [ 71.629409] ext4_writepages+0xc8/0x1b0 > > > [ 71.629410] ? ext4_writepages+0xc8/0x1b0 > > > [ 71.629411] do_writepages+0xc4/0x180 > > > [ 71.629416] __writeback_single_inode+0x45/0x350 > > > [ 71.629419] ? _raw_spin_unlock+0xe/0x40 > > > [ 71.629423] writeback_sb_inodes+0x260/0x5c0 > > > [ 71.629425] ? __schedule+0x4d1/0x1870 > > > [ 71.629429] __writeback_inodes_wb+0x54/0x100 > > > [ 71.629431] ? queue_io+0x82/0x140 > > > [ 71.629433] wb_writeback+0x1ab/0x330 > > > [ 71.629448] wb_workfn+0x31d/0x410 > > > [ 71.629450] process_one_work+0x191/0x3e0 > > > [ 71.629455] worker_thread+0x2e3/0x420 > > > > > > This issue can be easily reproduced by: > > > mkdir -p mnt > > > dd if=/dev/zero of=ext4disk bs=1G count=2 oflag=direct > > > mkfs.ext4 ext4disk > > > tune2fs -o journal_data ext4disk > > > mount ext4disk mnt > > > fio --name=fiotest --rw=randwrite --bs=4k --runtime=3 --ioengine=libaio --iodepth=128 --numjobs=4 --filename=mnt/fiotest --filesize=1G --group_reporting > > > mount -o remount,ro ext4disk mnt > > > sync > > > > > > In data=journal mode, metadata and data are both written to the journal > > > first, but for the second write, ext4 relies on the writeback thread to > > > flush the data to the real file location. > > > > > > After the filesystem is remounted to read only, writeback thread still > > > writes data to it and causes the issue. Return early to avoid starting > > > a journal transaction on a read only filesystem, once the filesystem > > > becomes writable again, the write thread will continue writing data. > > > > > > Signed-off-by: Gerald Yang <gerald.yang@canonical.com> > > > > Thanks for the report and the patch! I can indeed reproduce this warning. > > But the patch itself is certainly not the right fix for this problem. > > ext4_remount() must make sure there are no dirty pages on the filesystem > > anymore when remounting filesystem read only and it apparently fails to do > > so. In particular it calls sync_filesystem() which should make sure all > > data is written. So this bug needs more investigation why there are some > > dirty pages left in the inode in data=journal mode because > > ext4_writepages() should have written them all... > > > > Honza > > > > > --- > > > fs/ext4/inode.c | 11 +++++++++++ > > > 1 file changed, 11 insertions(+) > > > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > > index 15ba4d42982f..4e3bbf17995e 100644 > > > --- a/fs/ext4/inode.c > > > +++ b/fs/ext4/inode.c > > > @@ -2787,6 +2787,17 @@ static int ext4_do_writepages(struct mpage_da_data *mpd) > > > if (unlikely(ret)) > > > goto out_writepages; > > > > > > + /* > > > + * For data=journal, if the filesystem was remounted read-only, > > > + * the writeback thread may still write dirty pages to it. > > > + * Return early to avoid starting a journal transaction on a > > > + * read-only filesystem. > > > + */ > > > + if (ext4_should_journal_data(inode) && sb_rdonly(inode->i_sb)) { > > > + ret = -EROFS; > > > + goto out_writepages; > > > + } > > > + > > > /* > > > * If we have inline data and arrive here, it means that > > > * we will soon create the block for the 1st page, so > > > -- > > > 2.43.0 > > > > > -- > > Jan Kara <jack@suse.com> > > SUSE Labs, CR -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode 2026-01-29 9:31 ` Jan Kara @ 2026-01-30 11:38 ` Gerald Yang 2026-02-03 14:50 ` Jan Kara 0 siblings, 1 reply; 12+ messages in thread From: Gerald Yang @ 2026-01-30 11:38 UTC (permalink / raw) To: Jan Kara; +Cc: tytso, adilger.kernel, linux-ext4, gerald.yang.tw Thanks for sharing the findings, I'd also like to share some findings: I tried to figure out why the buffer is dirty after calling sync_filesystem, in mpage_prepare_extent_to_map, first I printed folio_test_dirty(folio): while (index <= end) ... for (i = 0; i < nr_folios; i++) { ... (print if folio is dirty here) and actually all folios are clean: if (!folio_test_dirty(folio) || ... folio_unlock(folio); continue; <==== continue here without writing anything Because the call trace happens before going into the above while loop: if (ext4_should_journal_data(mpd->inode)) { handle = ext4_journal_start(mpd->inode, EXT4_HT_WRITE_PAGE, it checks if the file system is read only and dumps the call trace in ext4_journal_check_start, but it doesn't check if there are any real writes that will happen later in the loop. To confirm this, first I added 2 more lines in the reproduce script before remounting read only: sync <==== it calls ext4_sync_fs to flush all dirty data same as what's called during remount read only echo 1 > /proc/sys/vm/drop_caches <==== drop clean page cache mount -o remount,ro ext4disk mnt Then I can no longer reproduce the call trace. Another way I tried was to add drop_pagecache_sb in __ext4_remount: if ((bool)(fc->sb_flags & SB_RDONLY) != sb_rdonly(sb)) { ... if (fc->sb_flags & SB_RDONLY) { err = sync_filesystem(sb); if (err < 0) goto restore_opts; (drop page caches for this file system here) With this, I can not reproduce the issue too. But I'm not sure if drop clean page cache after sync file system is a proper way to fix the issue, those page cache might still be read. Any thoughts? On Thu, Jan 29, 2026 at 5:31 PM Jan Kara <jack@suse.cz> wrote: > > On Thu 29-01-26 11:31:43, Gerald Yang wrote: > > Thanks Jan for the review, originally this issue was observed during reboot > > because the root filesystem is remounted to read only before shutdown to > > make sure all data is flushed to disk. > > We don't see any issue on the machine because the data is persisted to > > journal. But I think your suggestion is the correct way to fix it, I > > will look into > > why ext4_writepages doesn't flush data to real file location after calling > > sync_filesystem and re-submit the patch for review, thanks again. > > FWIW yesterday I did some investigation and it is always the tail (last > written) folio that is somehow kept dirty. In particular at the beginning > for ext4_do_writepages() we commit the running transaction and the bh > attached to the folio is just dirty but by the time we get to > ext4_bio_write_folio() to write it, the bh attached to the tail folio is > already part of the running transaction again and so ext4_bio_write_folio() > fails to write it. I didn't figure out how the bh gets reattached to the > transaction yet. Now I likely won't be able to dig more into this for a few > days so I'm just sharing my findings until now. > > Honza > > > On Wed, Jan 28, 2026 at 6:22 PM Jan Kara <jack@suse.cz> wrote: > > > > > > On Wed 28-01-26 15:45:15, Gerald Yang wrote: > > > > When remounting the filesystem to read only in data=journal mode > > > > it may dump the following call trace: > > > > > > > > [ 71.629350] CPU: 0 UID: 0 PID: 177 Comm: kworker/u96:5 Tainted: G E 6.19.0-rc7 #1 PREEMPT(voluntary) > > > > [ 71.629352] Tainted: [E]=UNSIGNED_MODULE > > > > [ 71.629353] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS unknown 2/2/2022 > > > > [ 71.629354] Workqueue: writeback wb_workfn (flush-7:4) > > > > [ 71.629359] RIP: 0010:ext4_journal_check_start+0x8b/0xd0 > > > > [ 71.629360] Code: 31 ff 45 31 c0 45 31 c9 e9 42 ad c4 00 48 8b 5d f8 b8 fb ff ff ff c9 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc cc cc cc <0f> 0b b8 e2 ff ff ff eb c2 0f 0b eb > > > > a9 44 8b 42 08 68 c7 53 ce b8 > > > > [ 71.629361] RSP: 0018:ffffcf32c0fdf6a8 EFLAGS: 00010202 > > > > [ 71.629364] RAX: ffff8f08c8505000 RBX: ffff8f08c67ee800 RCX: 0000000000000000 > > > > [ 71.629366] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > > [ 71.629367] RBP: ffffcf32c0fdf6b0 R08: 0000000000000001 R09: 0000000000000000 > > > > [ 71.629368] R10: ffff8f08db18b3a8 R11: 0000000000000000 R12: 0000000000000000 > > > > [ 71.629368] R13: 0000000000000002 R14: 0000000000000a48 R15: ffff8f08c67ee800 > > > > [ 71.629369] FS: 0000000000000000(0000) GS:ffff8f0a7d273000(0000) knlGS:0000000000000000 > > > > [ 71.629370] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 71.629371] CR2: 00007b66825905cc CR3: 000000011053d004 CR4: 0000000000772ef0 > > > > [ 71.629374] PKRU: 55555554 > > > > [ 71.629374] Call Trace: > > > > [ 71.629378] <TASK> > > > > [ 71.629382] __ext4_journal_start_sb+0x38/0x1c0 > > > > [ 71.629383] mpage_prepare_extent_to_map+0x4af/0x580 > > > > [ 71.629389] ? sbitmap_get+0x73/0x180 > > > > [ 71.629399] ext4_do_writepages+0x3cc/0x10a0 > > > > [ 71.629400] ? kvm_sched_clock_read+0x11/0x20 > > > > [ 71.629409] ext4_writepages+0xc8/0x1b0 > > > > [ 71.629410] ? ext4_writepages+0xc8/0x1b0 > > > > [ 71.629411] do_writepages+0xc4/0x180 > > > > [ 71.629416] __writeback_single_inode+0x45/0x350 > > > > [ 71.629419] ? _raw_spin_unlock+0xe/0x40 > > > > [ 71.629423] writeback_sb_inodes+0x260/0x5c0 > > > > [ 71.629425] ? __schedule+0x4d1/0x1870 > > > > [ 71.629429] __writeback_inodes_wb+0x54/0x100 > > > > [ 71.629431] ? queue_io+0x82/0x140 > > > > [ 71.629433] wb_writeback+0x1ab/0x330 > > > > [ 71.629448] wb_workfn+0x31d/0x410 > > > > [ 71.629450] process_one_work+0x191/0x3e0 > > > > [ 71.629455] worker_thread+0x2e3/0x420 > > > > > > > > This issue can be easily reproduced by: > > > > mkdir -p mnt > > > > dd if=/dev/zero of=ext4disk bs=1G count=2 oflag=direct > > > > mkfs.ext4 ext4disk > > > > tune2fs -o journal_data ext4disk > > > > mount ext4disk mnt > > > > fio --name=fiotest --rw=randwrite --bs=4k --runtime=3 --ioengine=libaio --iodepth=128 --numjobs=4 --filename=mnt/fiotest --filesize=1G --group_reporting > > > > mount -o remount,ro ext4disk mnt > > > > sync > > > > > > > > In data=journal mode, metadata and data are both written to the journal > > > > first, but for the second write, ext4 relies on the writeback thread to > > > > flush the data to the real file location. > > > > > > > > After the filesystem is remounted to read only, writeback thread still > > > > writes data to it and causes the issue. Return early to avoid starting > > > > a journal transaction on a read only filesystem, once the filesystem > > > > becomes writable again, the write thread will continue writing data. > > > > > > > > Signed-off-by: Gerald Yang <gerald.yang@canonical.com> > > > > > > Thanks for the report and the patch! I can indeed reproduce this warning. > > > But the patch itself is certainly not the right fix for this problem. > > > ext4_remount() must make sure there are no dirty pages on the filesystem > > > anymore when remounting filesystem read only and it apparently fails to do > > > so. In particular it calls sync_filesystem() which should make sure all > > > data is written. So this bug needs more investigation why there are some > > > dirty pages left in the inode in data=journal mode because > > > ext4_writepages() should have written them all... > > > > > > Honza > > > > > > > --- > > > > fs/ext4/inode.c | 11 +++++++++++ > > > > 1 file changed, 11 insertions(+) > > > > > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > > > index 15ba4d42982f..4e3bbf17995e 100644 > > > > --- a/fs/ext4/inode.c > > > > +++ b/fs/ext4/inode.c > > > > @@ -2787,6 +2787,17 @@ static int ext4_do_writepages(struct mpage_da_data *mpd) > > > > if (unlikely(ret)) > > > > goto out_writepages; > > > > > > > > + /* > > > > + * For data=journal, if the filesystem was remounted read-only, > > > > + * the writeback thread may still write dirty pages to it. > > > > + * Return early to avoid starting a journal transaction on a > > > > + * read-only filesystem. > > > > + */ > > > > + if (ext4_should_journal_data(inode) && sb_rdonly(inode->i_sb)) { > > > > + ret = -EROFS; > > > > + goto out_writepages; > > > > + } > > > > + > > > > /* > > > > * If we have inline data and arrive here, it means that > > > > * we will soon create the block for the 1st page, so > > > > -- > > > > 2.43.0 > > > > > > > -- > > > Jan Kara <jack@suse.com> > > > SUSE Labs, CR > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode 2026-01-30 11:38 ` Gerald Yang @ 2026-02-03 14:50 ` Jan Kara 2026-02-05 9:25 ` Jan Kara 0 siblings, 1 reply; 12+ messages in thread From: Jan Kara @ 2026-02-03 14:50 UTC (permalink / raw) To: Gerald Yang; +Cc: Jan Kara, tytso, adilger.kernel, linux-ext4, gerald.yang.tw Hello, On Fri 30-01-26 19:38:55, Gerald Yang wrote: > Thanks for sharing the findings, I'd also like to share some findings: > I tried to figure out why the buffer is dirty after calling sync_filesystem, > in mpage_prepare_extent_to_map, first I printed folio_test_dirty(folio): > > while (index <= end) > ... > for (i = 0; i < nr_folios; i++) { > ... > (print if folio is dirty here) > > and actually all folios are clean: > if (!folio_test_dirty(folio) || > ... > folio_unlock(folio); > continue; <==== continue here without writing anything > > Because the call trace happens before going into the above while loop: > > if (ext4_should_journal_data(mpd->inode)) { > handle = ext4_journal_start(mpd->inode, EXT4_HT_WRITE_PAGE, > > it checks if the file system is read only and dumps the call trace in > ext4_journal_check_start, but it doesn't check if there are any real writes > that will happen later in the loop. > > To confirm this, first I added 2 more lines in the reproduce script before > remounting read only: > sync <==== it calls ext4_sync_fs to flush all dirty data same as what's > called during remount read only > echo 1 > /proc/sys/vm/drop_caches <==== drop clean page cache > mount -o remount,ro ext4disk mnt > > Then I can no longer reproduce the call trace. OK, but ext4_do_writepages() has a check at the beginning: if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) goto out_writepages; So if there are no dirty pages, mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) should be false and so we shouldn't go further? It all looks like some kind of a race because I'm not always able to reproduce the problem... I'll try to look more into this. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode 2026-02-03 14:50 ` Jan Kara @ 2026-02-05 9:25 ` Jan Kara 2026-02-05 12:59 ` Gerald Yang 0 siblings, 1 reply; 12+ messages in thread From: Jan Kara @ 2026-02-05 9:25 UTC (permalink / raw) To: Gerald Yang; +Cc: Jan Kara, tytso, adilger.kernel, linux-ext4, gerald.yang.tw On Tue 03-02-26 15:50:43, Jan Kara wrote: > Hello, > > On Fri 30-01-26 19:38:55, Gerald Yang wrote: > > Thanks for sharing the findings, I'd also like to share some findings: > > I tried to figure out why the buffer is dirty after calling sync_filesystem, > > in mpage_prepare_extent_to_map, first I printed folio_test_dirty(folio): > > > > while (index <= end) > > ... > > for (i = 0; i < nr_folios; i++) { > > ... > > (print if folio is dirty here) > > > > and actually all folios are clean: > > if (!folio_test_dirty(folio) || > > ... > > folio_unlock(folio); > > continue; <==== continue here without writing anything > > > > Because the call trace happens before going into the above while loop: > > > > if (ext4_should_journal_data(mpd->inode)) { > > handle = ext4_journal_start(mpd->inode, EXT4_HT_WRITE_PAGE, > > > > it checks if the file system is read only and dumps the call trace in > > ext4_journal_check_start, but it doesn't check if there are any real writes > > that will happen later in the loop. > > > > To confirm this, first I added 2 more lines in the reproduce script before > > remounting read only: > > sync <==== it calls ext4_sync_fs to flush all dirty data same as what's > > called during remount read only > > echo 1 > /proc/sys/vm/drop_caches <==== drop clean page cache > > mount -o remount,ro ext4disk mnt > > > > Then I can no longer reproduce the call trace. > > OK, but ext4_do_writepages() has a check at the beginning: > > if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) > goto out_writepages; > > So if there are no dirty pages, mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) > should be false and so we shouldn't go further? > > It all looks like some kind of a race because I'm not always able to > reproduce the problem... I'll try to look more into this. OK, the race is with checkpointing code writing the buffers while flush worker tries to writeback the pages. I've posted a patch which fixes the issue for me. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode 2026-02-05 9:25 ` Jan Kara @ 2026-02-05 12:59 ` Gerald Yang 2026-03-26 1:50 ` Gerald Yang 0 siblings, 1 reply; 12+ messages in thread From: Gerald Yang @ 2026-02-05 12:59 UTC (permalink / raw) To: Jan Kara; +Cc: tytso, adilger.kernel, linux-ext4, gerald.yang.tw Thanks Jan for fixing this issue, I can confirm the patch works for me too. On Thu, Feb 5, 2026 at 5:25 PM Jan Kara <jack@suse.cz> wrote: > > On Tue 03-02-26 15:50:43, Jan Kara wrote: > > Hello, > > > > On Fri 30-01-26 19:38:55, Gerald Yang wrote: > > > Thanks for sharing the findings, I'd also like to share some findings: > > > I tried to figure out why the buffer is dirty after calling sync_filesystem, > > > in mpage_prepare_extent_to_map, first I printed folio_test_dirty(folio): > > > > > > while (index <= end) > > > ... > > > for (i = 0; i < nr_folios; i++) { > > > ... > > > (print if folio is dirty here) > > > > > > and actually all folios are clean: > > > if (!folio_test_dirty(folio) || > > > ... > > > folio_unlock(folio); > > > continue; <==== continue here without writing anything > > > > > > Because the call trace happens before going into the above while loop: > > > > > > if (ext4_should_journal_data(mpd->inode)) { > > > handle = ext4_journal_start(mpd->inode, EXT4_HT_WRITE_PAGE, > > > > > > it checks if the file system is read only and dumps the call trace in > > > ext4_journal_check_start, but it doesn't check if there are any real writes > > > that will happen later in the loop. > > > > > > To confirm this, first I added 2 more lines in the reproduce script before > > > remounting read only: > > > sync <==== it calls ext4_sync_fs to flush all dirty data same as what's > > > called during remount read only > > > echo 1 > /proc/sys/vm/drop_caches <==== drop clean page cache > > > mount -o remount,ro ext4disk mnt > > > > > > Then I can no longer reproduce the call trace. > > > > OK, but ext4_do_writepages() has a check at the beginning: > > > > if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) > > goto out_writepages; > > > > So if there are no dirty pages, mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) > > should be false and so we shouldn't go further? > > > > It all looks like some kind of a race because I'm not always able to > > reproduce the problem... I'll try to look more into this. > > OK, the race is with checkpointing code writing the buffers while flush > worker tries to writeback the pages. I've posted a patch which fixes the > issue for me. > > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode 2026-02-05 12:59 ` Gerald Yang @ 2026-03-26 1:50 ` Gerald Yang 2026-03-26 8:54 ` Jan Kara 0 siblings, 1 reply; 12+ messages in thread From: Gerald Yang @ 2026-03-26 1:50 UTC (permalink / raw) To: Jan Kara; +Cc: tytso, adilger.kernel, linux-ext4, gerald.yang.tw Hi Jan, I'd like to ask when this will land in upstream kernel? Thanks. On Thu, Feb 5, 2026 at 8:59 PM Gerald Yang <gerald.yang@canonical.com> wrote: > > Thanks Jan for fixing this issue, I can confirm the patch works for me too. > > > On Thu, Feb 5, 2026 at 5:25 PM Jan Kara <jack@suse.cz> wrote: > > > > On Tue 03-02-26 15:50:43, Jan Kara wrote: > > > Hello, > > > > > > On Fri 30-01-26 19:38:55, Gerald Yang wrote: > > > > Thanks for sharing the findings, I'd also like to share some findings: > > > > I tried to figure out why the buffer is dirty after calling sync_filesystem, > > > > in mpage_prepare_extent_to_map, first I printed folio_test_dirty(folio): > > > > > > > > while (index <= end) > > > > ... > > > > for (i = 0; i < nr_folios; i++) { > > > > ... > > > > (print if folio is dirty here) > > > > > > > > and actually all folios are clean: > > > > if (!folio_test_dirty(folio) || > > > > ... > > > > folio_unlock(folio); > > > > continue; <==== continue here without writing anything > > > > > > > > Because the call trace happens before going into the above while loop: > > > > > > > > if (ext4_should_journal_data(mpd->inode)) { > > > > handle = ext4_journal_start(mpd->inode, EXT4_HT_WRITE_PAGE, > > > > > > > > it checks if the file system is read only and dumps the call trace in > > > > ext4_journal_check_start, but it doesn't check if there are any real writes > > > > that will happen later in the loop. > > > > > > > > To confirm this, first I added 2 more lines in the reproduce script before > > > > remounting read only: > > > > sync <==== it calls ext4_sync_fs to flush all dirty data same as what's > > > > called during remount read only > > > > echo 1 > /proc/sys/vm/drop_caches <==== drop clean page cache > > > > mount -o remount,ro ext4disk mnt > > > > > > > > Then I can no longer reproduce the call trace. > > > > > > OK, but ext4_do_writepages() has a check at the beginning: > > > > > > if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) > > > goto out_writepages; > > > > > > So if there are no dirty pages, mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) > > > should be false and so we shouldn't go further? > > > > > > It all looks like some kind of a race because I'm not always able to > > > reproduce the problem... I'll try to look more into this. > > > > OK, the race is with checkpointing code writing the buffers while flush > > worker tries to writeback the pages. I've posted a patch which fixes the > > issue for me. > > > > Honza > > -- > > Jan Kara <jack@suse.com> > > SUSE Labs, CR ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode 2026-03-26 1:50 ` Gerald Yang @ 2026-03-26 8:54 ` Jan Kara 2026-03-27 7:26 ` Gerald Yang 0 siblings, 1 reply; 12+ messages in thread From: Jan Kara @ 2026-03-26 8:54 UTC (permalink / raw) To: Gerald Yang; +Cc: Jan Kara, tytso, adilger.kernel, linux-ext4, gerald.yang.tw Hi! On Thu 26-03-26 09:50:39, Gerald Yang wrote: > I'd like to ask when this will land in upstream kernel? Thanks. Well, this is more a question to Ted. He didn't pick up the patch [1] to his tree yet. I guess he'll pick it up for the next merge window which will start in two weeks or so. Honza [1] https://lore.kernel.org/linux-ext4/20260205092223.21287-2-jack@suse.cz/ > > On Thu, Feb 5, 2026 at 8:59 PM Gerald Yang <gerald.yang@canonical.com> wrote: > > > > Thanks Jan for fixing this issue, I can confirm the patch works for me too. > > > > > > On Thu, Feb 5, 2026 at 5:25 PM Jan Kara <jack@suse.cz> wrote: > > > > > > On Tue 03-02-26 15:50:43, Jan Kara wrote: > > > > Hello, > > > > > > > > On Fri 30-01-26 19:38:55, Gerald Yang wrote: > > > > > Thanks for sharing the findings, I'd also like to share some findings: > > > > > I tried to figure out why the buffer is dirty after calling sync_filesystem, > > > > > in mpage_prepare_extent_to_map, first I printed folio_test_dirty(folio): > > > > > > > > > > while (index <= end) > > > > > ... > > > > > for (i = 0; i < nr_folios; i++) { > > > > > ... > > > > > (print if folio is dirty here) > > > > > > > > > > and actually all folios are clean: > > > > > if (!folio_test_dirty(folio) || > > > > > ... > > > > > folio_unlock(folio); > > > > > continue; <==== continue here without writing anything > > > > > > > > > > Because the call trace happens before going into the above while loop: > > > > > > > > > > if (ext4_should_journal_data(mpd->inode)) { > > > > > handle = ext4_journal_start(mpd->inode, EXT4_HT_WRITE_PAGE, > > > > > > > > > > it checks if the file system is read only and dumps the call trace in > > > > > ext4_journal_check_start, but it doesn't check if there are any real writes > > > > > that will happen later in the loop. > > > > > > > > > > To confirm this, first I added 2 more lines in the reproduce script before > > > > > remounting read only: > > > > > sync <==== it calls ext4_sync_fs to flush all dirty data same as what's > > > > > called during remount read only > > > > > echo 1 > /proc/sys/vm/drop_caches <==== drop clean page cache > > > > > mount -o remount,ro ext4disk mnt > > > > > > > > > > Then I can no longer reproduce the call trace. > > > > > > > > OK, but ext4_do_writepages() has a check at the beginning: > > > > > > > > if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) > > > > goto out_writepages; > > > > > > > > So if there are no dirty pages, mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) > > > > should be false and so we shouldn't go further? > > > > > > > > It all looks like some kind of a race because I'm not always able to > > > > reproduce the problem... I'll try to look more into this. > > > > > > OK, the race is with checkpointing code writing the buffers while flush > > > worker tries to writeback the pages. I've posted a patch which fixes the > > > issue for me. > > > > > > Honza > > > -- > > > Jan Kara <jack@suse.com> > > > SUSE Labs, CR -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode 2026-03-26 8:54 ` Jan Kara @ 2026-03-27 7:26 ` Gerald Yang 2026-03-27 15:42 ` Theodore Tso 0 siblings, 1 reply; 12+ messages in thread From: Gerald Yang @ 2026-03-27 7:26 UTC (permalink / raw) To: Jan Kara; +Cc: tytso, adilger.kernel, linux-ext4, gerald.yang.tw Thanks Jan! Hi Ted, Would this be merged into the next merge window? https://lore.kernel.org/linux-ext4/20260205092223.21287-2-jack@suse.cz/ Thanks, Gerald On Thu, Mar 26, 2026 at 4:54 PM Jan Kara <jack@suse.cz> wrote: > > Hi! > > On Thu 26-03-26 09:50:39, Gerald Yang wrote: > > I'd like to ask when this will land in upstream kernel? Thanks. > > Well, this is more a question to Ted. He didn't pick up the patch [1] to > his tree yet. I guess he'll pick it up for the next merge window which will > start in two weeks or so. > > Honza > > [1] https://lore.kernel.org/linux-ext4/20260205092223.21287-2-jack@suse.cz/ > > > > > On Thu, Feb 5, 2026 at 8:59 PM Gerald Yang <gerald.yang@canonical.com> wrote: > > > > > > Thanks Jan for fixing this issue, I can confirm the patch works for me too. > > > > > > > > > On Thu, Feb 5, 2026 at 5:25 PM Jan Kara <jack@suse.cz> wrote: > > > > > > > > On Tue 03-02-26 15:50:43, Jan Kara wrote: > > > > > Hello, > > > > > > > > > > On Fri 30-01-26 19:38:55, Gerald Yang wrote: > > > > > > Thanks for sharing the findings, I'd also like to share some findings: > > > > > > I tried to figure out why the buffer is dirty after calling sync_filesystem, > > > > > > in mpage_prepare_extent_to_map, first I printed folio_test_dirty(folio): > > > > > > > > > > > > while (index <= end) > > > > > > ... > > > > > > for (i = 0; i < nr_folios; i++) { > > > > > > ... > > > > > > (print if folio is dirty here) > > > > > > > > > > > > and actually all folios are clean: > > > > > > if (!folio_test_dirty(folio) || > > > > > > ... > > > > > > folio_unlock(folio); > > > > > > continue; <==== continue here without writing anything > > > > > > > > > > > > Because the call trace happens before going into the above while loop: > > > > > > > > > > > > if (ext4_should_journal_data(mpd->inode)) { > > > > > > handle = ext4_journal_start(mpd->inode, EXT4_HT_WRITE_PAGE, > > > > > > > > > > > > it checks if the file system is read only and dumps the call trace in > > > > > > ext4_journal_check_start, but it doesn't check if there are any real writes > > > > > > that will happen later in the loop. > > > > > > > > > > > > To confirm this, first I added 2 more lines in the reproduce script before > > > > > > remounting read only: > > > > > > sync <==== it calls ext4_sync_fs to flush all dirty data same as what's > > > > > > called during remount read only > > > > > > echo 1 > /proc/sys/vm/drop_caches <==== drop clean page cache > > > > > > mount -o remount,ro ext4disk mnt > > > > > > > > > > > > Then I can no longer reproduce the call trace. > > > > > > > > > > OK, but ext4_do_writepages() has a check at the beginning: > > > > > > > > > > if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) > > > > > goto out_writepages; > > > > > > > > > > So if there are no dirty pages, mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) > > > > > should be false and so we shouldn't go further? > > > > > > > > > > It all looks like some kind of a race because I'm not always able to > > > > > reproduce the problem... I'll try to look more into this. > > > > > > > > OK, the race is with checkpointing code writing the buffers while flush > > > > worker tries to writeback the pages. I've posted a patch which fixes the > > > > issue for me. > > > > > > > > Honza > > > > -- > > > > Jan Kara <jack@suse.com> > > > > SUSE Labs, CR > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode 2026-03-27 7:26 ` Gerald Yang @ 2026-03-27 15:42 ` Theodore Tso 0 siblings, 0 replies; 12+ messages in thread From: Theodore Tso @ 2026-03-27 15:42 UTC (permalink / raw) To: Gerald Yang; +Cc: Jan Kara, adilger.kernel, linux-ext4, gerald.yang.tw On Fri, Mar 27, 2026 at 03:26:38PM +0800, Gerald Yang wrote: > > Would this be merged into the next merge window? > https://lore.kernel.org/linux-ext4/20260205092223.21287-2-jack@suse.cz/ Yes, it's in the ext4 git tree already and will be sent along with a number of other bugfixes to Linus before 7.0 is released. - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-03-27 15:42 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-28 7:45 [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode Gerald Yang 2026-01-28 10:22 ` Jan Kara 2026-01-29 3:31 ` Gerald Yang 2026-01-29 9:31 ` Jan Kara 2026-01-30 11:38 ` Gerald Yang 2026-02-03 14:50 ` Jan Kara 2026-02-05 9:25 ` Jan Kara 2026-02-05 12:59 ` Gerald Yang 2026-03-26 1:50 ` Gerald Yang 2026-03-26 8:54 ` Jan Kara 2026-03-27 7:26 ` Gerald Yang 2026-03-27 15:42 ` Theodore Tso
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox