[PATCH] ext4: Fix call trace when remounting to read only in data=journal mode

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode
@ 2026-01-28  7:45 Gerald Yang
  2026-01-28 10:22 ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Gerald Yang @ 2026-01-28  7:45 UTC (permalink / raw)
  To: tytso, adilger.kernel, jack; +Cc: linux-ext4, gerald.yang.tw, Gerald Yang

When remounting the filesystem to read only in data=journal mode
it may dump the following call trace:

[   71.629350] CPU: 0 UID: 0 PID: 177 Comm: kworker/u96:5 Tainted: G            E       6.19.0-rc7 #1 PREEMPT(voluntary)
[   71.629352] Tainted: [E]=UNSIGNED_MODULE
[   71.629353] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS unknown 2/2/2022
[   71.629354] Workqueue: writeback wb_workfn (flush-7:4)
[   71.629359] RIP: 0010:ext4_journal_check_start+0x8b/0xd0
[   71.629360] Code: 31 ff 45 31 c0 45 31 c9 e9 42 ad c4 00 48 8b 5d f8 b8 fb ff ff ff c9 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc cc cc cc <0f> 0b b8 e2 ff ff ff eb c2 0f 0b eb
 a9 44 8b 42 08 68 c7 53 ce b8
[   71.629361] RSP: 0018:ffffcf32c0fdf6a8 EFLAGS: 00010202
[   71.629364] RAX: ffff8f08c8505000 RBX: ffff8f08c67ee800 RCX: 0000000000000000
[   71.629366] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[   71.629367] RBP: ffffcf32c0fdf6b0 R08: 0000000000000001 R09: 0000000000000000
[   71.629368] R10: ffff8f08db18b3a8 R11: 0000000000000000 R12: 0000000000000000
[   71.629368] R13: 0000000000000002 R14: 0000000000000a48 R15: ffff8f08c67ee800
[   71.629369] FS:  0000000000000000(0000) GS:ffff8f0a7d273000(0000) knlGS:0000000000000000
[   71.629370] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   71.629371] CR2: 00007b66825905cc CR3: 000000011053d004 CR4: 0000000000772ef0
[   71.629374] PKRU: 55555554
[   71.629374] Call Trace:
[   71.629378]  <TASK>
[   71.629382]  __ext4_journal_start_sb+0x38/0x1c0
[   71.629383]  mpage_prepare_extent_to_map+0x4af/0x580
[   71.629389]  ? sbitmap_get+0x73/0x180
[   71.629399]  ext4_do_writepages+0x3cc/0x10a0
[   71.629400]  ? kvm_sched_clock_read+0x11/0x20
[   71.629409]  ext4_writepages+0xc8/0x1b0
[   71.629410]  ? ext4_writepages+0xc8/0x1b0
[   71.629411]  do_writepages+0xc4/0x180
[   71.629416]  __writeback_single_inode+0x45/0x350
[   71.629419]  ? _raw_spin_unlock+0xe/0x40
[   71.629423]  writeback_sb_inodes+0x260/0x5c0
[   71.629425]  ? __schedule+0x4d1/0x1870
[   71.629429]  __writeback_inodes_wb+0x54/0x100
[   71.629431]  ? queue_io+0x82/0x140
[   71.629433]  wb_writeback+0x1ab/0x330
[   71.629448]  wb_workfn+0x31d/0x410
[   71.629450]  process_one_work+0x191/0x3e0
[   71.629455]  worker_thread+0x2e3/0x420

This issue can be easily reproduced by:
mkdir -p mnt
dd if=/dev/zero of=ext4disk bs=1G count=2 oflag=direct
mkfs.ext4 ext4disk
tune2fs -o journal_data ext4disk
mount ext4disk mnt
fio --name=fiotest --rw=randwrite --bs=4k --runtime=3 --ioengine=libaio --iodepth=128 --numjobs=4 --filename=mnt/fiotest --filesize=1G --group_reporting
mount -o remount,ro ext4disk mnt
sync

In data=journal mode, metadata and data are both written to the journal
first, but for the second write, ext4 relies on the writeback thread to
flush the data to the real file location.

After the filesystem is remounted to read only, writeback thread still
writes data to it and causes the issue. Return early to avoid starting
a journal transaction on a read only filesystem, once the filesystem
becomes writable again, the write thread will continue writing data.

Signed-off-by: Gerald Yang <gerald.yang@canonical.com>
---
 fs/ext4/inode.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 15ba4d42982f..4e3bbf17995e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2787,6 +2787,17 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
 	if (unlikely(ret))
 		goto out_writepages;
 
+	/*
+	 * For data=journal, if the filesystem was remounted read-only,
+	 * the writeback thread may still write dirty pages to it.
+	 * Return early to avoid starting a journal transaction on a
+	 * read-only filesystem.
+	 */
+	if (ext4_should_journal_data(inode) && sb_rdonly(inode->i_sb)) {
+		ret = -EROFS;
+		goto out_writepages;
+	}
+
 	/*
 	 * If we have inline data and arrive here, it means that
 	 * we will soon create the block for the 1st page, so
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode
  2026-01-28  7:45 [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode Gerald Yang
@ 2026-01-28 10:22 ` Jan Kara
  2026-01-29  3:31   ` Gerald Yang
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2026-01-28 10:22 UTC (permalink / raw)
  To: Gerald Yang; +Cc: tytso, adilger.kernel, jack, linux-ext4, gerald.yang.tw

On Wed 28-01-26 15:45:15, Gerald Yang wrote:
> When remounting the filesystem to read only in data=journal mode
> it may dump the following call trace:
> 
> [   71.629350] CPU: 0 UID: 0 PID: 177 Comm: kworker/u96:5 Tainted: G            E       6.19.0-rc7 #1 PREEMPT(voluntary)
> [   71.629352] Tainted: [E]=UNSIGNED_MODULE
> [   71.629353] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS unknown 2/2/2022
> [   71.629354] Workqueue: writeback wb_workfn (flush-7:4)
> [   71.629359] RIP: 0010:ext4_journal_check_start+0x8b/0xd0
> [   71.629360] Code: 31 ff 45 31 c0 45 31 c9 e9 42 ad c4 00 48 8b 5d f8 b8 fb ff ff ff c9 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc cc cc cc <0f> 0b b8 e2 ff ff ff eb c2 0f 0b eb
>  a9 44 8b 42 08 68 c7 53 ce b8
> [   71.629361] RSP: 0018:ffffcf32c0fdf6a8 EFLAGS: 00010202
> [   71.629364] RAX: ffff8f08c8505000 RBX: ffff8f08c67ee800 RCX: 0000000000000000
> [   71.629366] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [   71.629367] RBP: ffffcf32c0fdf6b0 R08: 0000000000000001 R09: 0000000000000000
> [   71.629368] R10: ffff8f08db18b3a8 R11: 0000000000000000 R12: 0000000000000000
> [   71.629368] R13: 0000000000000002 R14: 0000000000000a48 R15: ffff8f08c67ee800
> [   71.629369] FS:  0000000000000000(0000) GS:ffff8f0a7d273000(0000) knlGS:0000000000000000
> [   71.629370] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   71.629371] CR2: 00007b66825905cc CR3: 000000011053d004 CR4: 0000000000772ef0
> [   71.629374] PKRU: 55555554
> [   71.629374] Call Trace:
> [   71.629378]  <TASK>
> [   71.629382]  __ext4_journal_start_sb+0x38/0x1c0
> [   71.629383]  mpage_prepare_extent_to_map+0x4af/0x580
> [   71.629389]  ? sbitmap_get+0x73/0x180
> [   71.629399]  ext4_do_writepages+0x3cc/0x10a0
> [   71.629400]  ? kvm_sched_clock_read+0x11/0x20
> [   71.629409]  ext4_writepages+0xc8/0x1b0
> [   71.629410]  ? ext4_writepages+0xc8/0x1b0
> [   71.629411]  do_writepages+0xc4/0x180
> [   71.629416]  __writeback_single_inode+0x45/0x350
> [   71.629419]  ? _raw_spin_unlock+0xe/0x40
> [   71.629423]  writeback_sb_inodes+0x260/0x5c0
> [   71.629425]  ? __schedule+0x4d1/0x1870
> [   71.629429]  __writeback_inodes_wb+0x54/0x100
> [   71.629431]  ? queue_io+0x82/0x140
> [   71.629433]  wb_writeback+0x1ab/0x330
> [   71.629448]  wb_workfn+0x31d/0x410
> [   71.629450]  process_one_work+0x191/0x3e0
> [   71.629455]  worker_thread+0x2e3/0x420
> 
> This issue can be easily reproduced by:
> mkdir -p mnt
> dd if=/dev/zero of=ext4disk bs=1G count=2 oflag=direct
> mkfs.ext4 ext4disk
> tune2fs -o journal_data ext4disk
> mount ext4disk mnt
> fio --name=fiotest --rw=randwrite --bs=4k --runtime=3 --ioengine=libaio --iodepth=128 --numjobs=4 --filename=mnt/fiotest --filesize=1G --group_reporting
> mount -o remount,ro ext4disk mnt
> sync
> 
> In data=journal mode, metadata and data are both written to the journal
> first, but for the second write, ext4 relies on the writeback thread to
> flush the data to the real file location.
> 
> After the filesystem is remounted to read only, writeback thread still
> writes data to it and causes the issue. Return early to avoid starting
> a journal transaction on a read only filesystem, once the filesystem
> becomes writable again, the write thread will continue writing data.
> 
> Signed-off-by: Gerald Yang <gerald.yang@canonical.com>

Thanks for the report and the patch! I can indeed reproduce this warning.
But the patch itself is certainly not the right fix for this problem.
ext4_remount() must make sure there are no dirty pages on the filesystem
anymore when remounting filesystem read only and it apparently fails to do
so. In particular it calls sync_filesystem() which should make sure all
data is written. So this bug needs more investigation why there are some
dirty pages left in the inode in data=journal mode because
ext4_writepages() should have written them all...

								Honza

> ---
>  fs/ext4/inode.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 15ba4d42982f..4e3bbf17995e 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2787,6 +2787,17 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
>  	if (unlikely(ret))
>  		goto out_writepages;
>  
> +	/*
> +	 * For data=journal, if the filesystem was remounted read-only,
> +	 * the writeback thread may still write dirty pages to it.
> +	 * Return early to avoid starting a journal transaction on a
> +	 * read-only filesystem.
> +	 */
> +	if (ext4_should_journal_data(inode) && sb_rdonly(inode->i_sb)) {
> +		ret = -EROFS;
> +		goto out_writepages;
> +	}
> +
>  	/*
>  	 * If we have inline data and arrive here, it means that
>  	 * we will soon create the block for the 1st page, so
> -- 
> 2.43.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode
  2026-01-28 10:22 ` Jan Kara
@ 2026-01-29  3:31   ` Gerald Yang
  2026-01-29  9:31     ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Gerald Yang @ 2026-01-29  3:31 UTC (permalink / raw)
  To: Jan Kara; +Cc: tytso, adilger.kernel, linux-ext4, gerald.yang.tw

Thanks Jan for the review, originally this issue was observed during reboot
because the root filesystem is remounted to read only before shutdown to
make sure all data is flushed to disk.
We don't see any issue on the machine because the data is persisted to
journal. But I think your suggestion is the correct way to fix it, I
will look into
why ext4_writepages doesn't flush data to real file location after calling
sync_filesystem and re-submit the patch for review, thanks again.


On Wed, Jan 28, 2026 at 6:22 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 28-01-26 15:45:15, Gerald Yang wrote:
> > When remounting the filesystem to read only in data=journal mode
> > it may dump the following call trace:
> >
> > [   71.629350] CPU: 0 UID: 0 PID: 177 Comm: kworker/u96:5 Tainted: G            E       6.19.0-rc7 #1 PREEMPT(voluntary)
> > [   71.629352] Tainted: [E]=UNSIGNED_MODULE
> > [   71.629353] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS unknown 2/2/2022
> > [   71.629354] Workqueue: writeback wb_workfn (flush-7:4)
> > [   71.629359] RIP: 0010:ext4_journal_check_start+0x8b/0xd0
> > [   71.629360] Code: 31 ff 45 31 c0 45 31 c9 e9 42 ad c4 00 48 8b 5d f8 b8 fb ff ff ff c9 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc cc cc cc <0f> 0b b8 e2 ff ff ff eb c2 0f 0b eb
> >  a9 44 8b 42 08 68 c7 53 ce b8
> > [   71.629361] RSP: 0018:ffffcf32c0fdf6a8 EFLAGS: 00010202
> > [   71.629364] RAX: ffff8f08c8505000 RBX: ffff8f08c67ee800 RCX: 0000000000000000
> > [   71.629366] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > [   71.629367] RBP: ffffcf32c0fdf6b0 R08: 0000000000000001 R09: 0000000000000000
> > [   71.629368] R10: ffff8f08db18b3a8 R11: 0000000000000000 R12: 0000000000000000
> > [   71.629368] R13: 0000000000000002 R14: 0000000000000a48 R15: ffff8f08c67ee800
> > [   71.629369] FS:  0000000000000000(0000) GS:ffff8f0a7d273000(0000) knlGS:0000000000000000
> > [   71.629370] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   71.629371] CR2: 00007b66825905cc CR3: 000000011053d004 CR4: 0000000000772ef0
> > [   71.629374] PKRU: 55555554
> > [   71.629374] Call Trace:
> > [   71.629378]  <TASK>
> > [   71.629382]  __ext4_journal_start_sb+0x38/0x1c0
> > [   71.629383]  mpage_prepare_extent_to_map+0x4af/0x580
> > [   71.629389]  ? sbitmap_get+0x73/0x180
> > [   71.629399]  ext4_do_writepages+0x3cc/0x10a0
> > [   71.629400]  ? kvm_sched_clock_read+0x11/0x20
> > [   71.629409]  ext4_writepages+0xc8/0x1b0
> > [   71.629410]  ? ext4_writepages+0xc8/0x1b0
> > [   71.629411]  do_writepages+0xc4/0x180
> > [   71.629416]  __writeback_single_inode+0x45/0x350
> > [   71.629419]  ? _raw_spin_unlock+0xe/0x40
> > [   71.629423]  writeback_sb_inodes+0x260/0x5c0
> > [   71.629425]  ? __schedule+0x4d1/0x1870
> > [   71.629429]  __writeback_inodes_wb+0x54/0x100
> > [   71.629431]  ? queue_io+0x82/0x140
> > [   71.629433]  wb_writeback+0x1ab/0x330
> > [   71.629448]  wb_workfn+0x31d/0x410
> > [   71.629450]  process_one_work+0x191/0x3e0
> > [   71.629455]  worker_thread+0x2e3/0x420
> >
> > This issue can be easily reproduced by:
> > mkdir -p mnt
> > dd if=/dev/zero of=ext4disk bs=1G count=2 oflag=direct
> > mkfs.ext4 ext4disk
> > tune2fs -o journal_data ext4disk
> > mount ext4disk mnt
> > fio --name=fiotest --rw=randwrite --bs=4k --runtime=3 --ioengine=libaio --iodepth=128 --numjobs=4 --filename=mnt/fiotest --filesize=1G --group_reporting
> > mount -o remount,ro ext4disk mnt
> > sync
> >
> > In data=journal mode, metadata and data are both written to the journal
> > first, but for the second write, ext4 relies on the writeback thread to
> > flush the data to the real file location.
> >
> > After the filesystem is remounted to read only, writeback thread still
> > writes data to it and causes the issue. Return early to avoid starting
> > a journal transaction on a read only filesystem, once the filesystem
> > becomes writable again, the write thread will continue writing data.
> >
> > Signed-off-by: Gerald Yang <gerald.yang@canonical.com>
>
> Thanks for the report and the patch! I can indeed reproduce this warning.
> But the patch itself is certainly not the right fix for this problem.
> ext4_remount() must make sure there are no dirty pages on the filesystem
> anymore when remounting filesystem read only and it apparently fails to do
> so. In particular it calls sync_filesystem() which should make sure all
> data is written. So this bug needs more investigation why there are some
> dirty pages left in the inode in data=journal mode because
> ext4_writepages() should have written them all...
>
>                                                                 Honza
>
> > ---
> >  fs/ext4/inode.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 15ba4d42982f..4e3bbf17995e 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -2787,6 +2787,17 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
> >       if (unlikely(ret))
> >               goto out_writepages;
> >
> > +     /*
> > +      * For data=journal, if the filesystem was remounted read-only,
> > +      * the writeback thread may still write dirty pages to it.
> > +      * Return early to avoid starting a journal transaction on a
> > +      * read-only filesystem.
> > +      */
> > +     if (ext4_should_journal_data(inode) && sb_rdonly(inode->i_sb)) {
> > +             ret = -EROFS;
> > +             goto out_writepages;
> > +     }
> > +
> >       /*
> >        * If we have inline data and arrive here, it means that
> >        * we will soon create the block for the 1st page, so
> > --
> > 2.43.0
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode
  2026-01-29  3:31   ` Gerald Yang
@ 2026-01-29  9:31     ` Jan Kara
  2026-01-30 11:38       ` Gerald Yang
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2026-01-29  9:31 UTC (permalink / raw)
  To: Gerald Yang; +Cc: Jan Kara, tytso, adilger.kernel, linux-ext4, gerald.yang.tw

On Thu 29-01-26 11:31:43, Gerald Yang wrote:
> Thanks Jan for the review, originally this issue was observed during reboot
> because the root filesystem is remounted to read only before shutdown to
> make sure all data is flushed to disk.
> We don't see any issue on the machine because the data is persisted to
> journal. But I think your suggestion is the correct way to fix it, I
> will look into
> why ext4_writepages doesn't flush data to real file location after calling
> sync_filesystem and re-submit the patch for review, thanks again.

FWIW yesterday I did some investigation and it is always the tail (last
written) folio that is somehow kept dirty. In particular at the beginning
for ext4_do_writepages() we commit the running transaction and the bh
attached to the folio is just dirty but by the time we get to
ext4_bio_write_folio() to write it, the bh attached to the tail folio is
already part of the running transaction again and so ext4_bio_write_folio()
fails to write it. I didn't figure out how the bh gets reattached to the
transaction yet. Now I likely won't be able to dig more into this for a few
days so I'm just sharing my findings until now.

								Honza

> On Wed, Jan 28, 2026 at 6:22 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Wed 28-01-26 15:45:15, Gerald Yang wrote:
> > > When remounting the filesystem to read only in data=journal mode
> > > it may dump the following call trace:
> > >
> > > [   71.629350] CPU: 0 UID: 0 PID: 177 Comm: kworker/u96:5 Tainted: G            E       6.19.0-rc7 #1 PREEMPT(voluntary)
> > > [   71.629352] Tainted: [E]=UNSIGNED_MODULE
> > > [   71.629353] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS unknown 2/2/2022
> > > [   71.629354] Workqueue: writeback wb_workfn (flush-7:4)
> > > [   71.629359] RIP: 0010:ext4_journal_check_start+0x8b/0xd0
> > > [   71.629360] Code: 31 ff 45 31 c0 45 31 c9 e9 42 ad c4 00 48 8b 5d f8 b8 fb ff ff ff c9 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc cc cc cc <0f> 0b b8 e2 ff ff ff eb c2 0f 0b eb
> > >  a9 44 8b 42 08 68 c7 53 ce b8
> > > [   71.629361] RSP: 0018:ffffcf32c0fdf6a8 EFLAGS: 00010202
> > > [   71.629364] RAX: ffff8f08c8505000 RBX: ffff8f08c67ee800 RCX: 0000000000000000
> > > [   71.629366] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > [   71.629367] RBP: ffffcf32c0fdf6b0 R08: 0000000000000001 R09: 0000000000000000
> > > [   71.629368] R10: ffff8f08db18b3a8 R11: 0000000000000000 R12: 0000000000000000
> > > [   71.629368] R13: 0000000000000002 R14: 0000000000000a48 R15: ffff8f08c67ee800
> > > [   71.629369] FS:  0000000000000000(0000) GS:ffff8f0a7d273000(0000) knlGS:0000000000000000
> > > [   71.629370] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   71.629371] CR2: 00007b66825905cc CR3: 000000011053d004 CR4: 0000000000772ef0
> > > [   71.629374] PKRU: 55555554
> > > [   71.629374] Call Trace:
> > > [   71.629378]  <TASK>
> > > [   71.629382]  __ext4_journal_start_sb+0x38/0x1c0
> > > [   71.629383]  mpage_prepare_extent_to_map+0x4af/0x580
> > > [   71.629389]  ? sbitmap_get+0x73/0x180
> > > [   71.629399]  ext4_do_writepages+0x3cc/0x10a0
> > > [   71.629400]  ? kvm_sched_clock_read+0x11/0x20
> > > [   71.629409]  ext4_writepages+0xc8/0x1b0
> > > [   71.629410]  ? ext4_writepages+0xc8/0x1b0
> > > [   71.629411]  do_writepages+0xc4/0x180
> > > [   71.629416]  __writeback_single_inode+0x45/0x350
> > > [   71.629419]  ? _raw_spin_unlock+0xe/0x40
> > > [   71.629423]  writeback_sb_inodes+0x260/0x5c0
> > > [   71.629425]  ? __schedule+0x4d1/0x1870
> > > [   71.629429]  __writeback_inodes_wb+0x54/0x100
> > > [   71.629431]  ? queue_io+0x82/0x140
> > > [   71.629433]  wb_writeback+0x1ab/0x330
> > > [   71.629448]  wb_workfn+0x31d/0x410
> > > [   71.629450]  process_one_work+0x191/0x3e0
> > > [   71.629455]  worker_thread+0x2e3/0x420
> > >
> > > This issue can be easily reproduced by:
> > > mkdir -p mnt
> > > dd if=/dev/zero of=ext4disk bs=1G count=2 oflag=direct
> > > mkfs.ext4 ext4disk
> > > tune2fs -o journal_data ext4disk
> > > mount ext4disk mnt
> > > fio --name=fiotest --rw=randwrite --bs=4k --runtime=3 --ioengine=libaio --iodepth=128 --numjobs=4 --filename=mnt/fiotest --filesize=1G --group_reporting
> > > mount -o remount,ro ext4disk mnt
> > > sync
> > >
> > > In data=journal mode, metadata and data are both written to the journal
> > > first, but for the second write, ext4 relies on the writeback thread to
> > > flush the data to the real file location.
> > >
> > > After the filesystem is remounted to read only, writeback thread still
> > > writes data to it and causes the issue. Return early to avoid starting
> > > a journal transaction on a read only filesystem, once the filesystem
> > > becomes writable again, the write thread will continue writing data.
> > >
> > > Signed-off-by: Gerald Yang <gerald.yang@canonical.com>
> >
> > Thanks for the report and the patch! I can indeed reproduce this warning.
> > But the patch itself is certainly not the right fix for this problem.
> > ext4_remount() must make sure there are no dirty pages on the filesystem
> > anymore when remounting filesystem read only and it apparently fails to do
> > so. In particular it calls sync_filesystem() which should make sure all
> > data is written. So this bug needs more investigation why there are some
> > dirty pages left in the inode in data=journal mode because
> > ext4_writepages() should have written them all...
> >
> >                                                                 Honza
> >
> > > ---
> > >  fs/ext4/inode.c | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > >
> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > index 15ba4d42982f..4e3bbf17995e 100644
> > > --- a/fs/ext4/inode.c
> > > +++ b/fs/ext4/inode.c
> > > @@ -2787,6 +2787,17 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
> > >       if (unlikely(ret))
> > >               goto out_writepages;
> > >
> > > +     /*
> > > +      * For data=journal, if the filesystem was remounted read-only,
> > > +      * the writeback thread may still write dirty pages to it.
> > > +      * Return early to avoid starting a journal transaction on a
> > > +      * read-only filesystem.
> > > +      */
> > > +     if (ext4_should_journal_data(inode) && sb_rdonly(inode->i_sb)) {
> > > +             ret = -EROFS;
> > > +             goto out_writepages;
> > > +     }
> > > +
> > >       /*
> > >        * If we have inline data and arrive here, it means that
> > >        * we will soon create the block for the 1st page, so
> > > --
> > > 2.43.0
> > >
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode
  2026-01-29  9:31     ` Jan Kara
@ 2026-01-30 11:38       ` Gerald Yang
  2026-02-03 14:50         ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Gerald Yang @ 2026-01-30 11:38 UTC (permalink / raw)
  To: Jan Kara; +Cc: tytso, adilger.kernel, linux-ext4, gerald.yang.tw

Thanks for sharing the findings, I'd also like to share some findings:
I tried to figure out why the buffer is dirty after calling sync_filesystem,
in mpage_prepare_extent_to_map, first I printed folio_test_dirty(folio):

while (index <= end)
    ...
    for (i = 0; i < nr_folios; i++) {
        ...
        (print if folio is dirty here)

and actually all folios are clean:
if (!folio_test_dirty(folio) ||
    ...
    folio_unlock(folio);
    continue;       <==== continue here without writing anything

Because the call trace happens before going into the above while loop:

if (ext4_should_journal_data(mpd->inode)) {
    handle = ext4_journal_start(mpd->inode, EXT4_HT_WRITE_PAGE,

it checks if the file system is read only and dumps the call trace in
ext4_journal_check_start, but it doesn't check if there are any real writes
that will happen later in the loop.

To confirm this, first I added 2 more lines in the reproduce script before
remounting read only:
sync      <==== it calls ext4_sync_fs to flush all dirty data same as what's
                         called during remount read only
echo 1 > /proc/sys/vm/drop_caches       <==== drop clean page cache
mount -o remount,ro ext4disk mnt

Then I can no longer reproduce the call trace.

Another way I tried was to add drop_pagecache_sb in __ext4_remount:

if ((bool)(fc->sb_flags & SB_RDONLY) != sb_rdonly(sb)) {
    ...
    if (fc->sb_flags & SB_RDONLY) {
        err = sync_filesystem(sb);
        if (err < 0)
            goto restore_opts;
        (drop page caches for this file system here)

With this, I can not reproduce the issue too. But I'm not sure if drop clean
page cache after sync file system is a proper way to fix the issue, those
page cache might still be read. Any thoughts?


On Thu, Jan 29, 2026 at 5:31 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 29-01-26 11:31:43, Gerald Yang wrote:
> > Thanks Jan for the review, originally this issue was observed during reboot
> > because the root filesystem is remounted to read only before shutdown to
> > make sure all data is flushed to disk.
> > We don't see any issue on the machine because the data is persisted to
> > journal. But I think your suggestion is the correct way to fix it, I
> > will look into
> > why ext4_writepages doesn't flush data to real file location after calling
> > sync_filesystem and re-submit the patch for review, thanks again.
>
> FWIW yesterday I did some investigation and it is always the tail (last
> written) folio that is somehow kept dirty. In particular at the beginning
> for ext4_do_writepages() we commit the running transaction and the bh
> attached to the folio is just dirty but by the time we get to
> ext4_bio_write_folio() to write it, the bh attached to the tail folio is
> already part of the running transaction again and so ext4_bio_write_folio()
> fails to write it. I didn't figure out how the bh gets reattached to the
> transaction yet. Now I likely won't be able to dig more into this for a few
> days so I'm just sharing my findings until now.
>
>                                                                 Honza
>
> > On Wed, Jan 28, 2026 at 6:22 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Wed 28-01-26 15:45:15, Gerald Yang wrote:
> > > > When remounting the filesystem to read only in data=journal mode
> > > > it may dump the following call trace:
> > > >
> > > > [   71.629350] CPU: 0 UID: 0 PID: 177 Comm: kworker/u96:5 Tainted: G            E       6.19.0-rc7 #1 PREEMPT(voluntary)
> > > > [   71.629352] Tainted: [E]=UNSIGNED_MODULE
> > > > [   71.629353] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS unknown 2/2/2022
> > > > [   71.629354] Workqueue: writeback wb_workfn (flush-7:4)
> > > > [   71.629359] RIP: 0010:ext4_journal_check_start+0x8b/0xd0
> > > > [   71.629360] Code: 31 ff 45 31 c0 45 31 c9 e9 42 ad c4 00 48 8b 5d f8 b8 fb ff ff ff c9 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc cc cc cc <0f> 0b b8 e2 ff ff ff eb c2 0f 0b eb
> > > >  a9 44 8b 42 08 68 c7 53 ce b8
> > > > [   71.629361] RSP: 0018:ffffcf32c0fdf6a8 EFLAGS: 00010202
> > > > [   71.629364] RAX: ffff8f08c8505000 RBX: ffff8f08c67ee800 RCX: 0000000000000000
> > > > [   71.629366] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > > [   71.629367] RBP: ffffcf32c0fdf6b0 R08: 0000000000000001 R09: 0000000000000000
> > > > [   71.629368] R10: ffff8f08db18b3a8 R11: 0000000000000000 R12: 0000000000000000
> > > > [   71.629368] R13: 0000000000000002 R14: 0000000000000a48 R15: ffff8f08c67ee800
> > > > [   71.629369] FS:  0000000000000000(0000) GS:ffff8f0a7d273000(0000) knlGS:0000000000000000
> > > > [   71.629370] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [   71.629371] CR2: 00007b66825905cc CR3: 000000011053d004 CR4: 0000000000772ef0
> > > > [   71.629374] PKRU: 55555554
> > > > [   71.629374] Call Trace:
> > > > [   71.629378]  <TASK>
> > > > [   71.629382]  __ext4_journal_start_sb+0x38/0x1c0
> > > > [   71.629383]  mpage_prepare_extent_to_map+0x4af/0x580
> > > > [   71.629389]  ? sbitmap_get+0x73/0x180
> > > > [   71.629399]  ext4_do_writepages+0x3cc/0x10a0
> > > > [   71.629400]  ? kvm_sched_clock_read+0x11/0x20
> > > > [   71.629409]  ext4_writepages+0xc8/0x1b0
> > > > [   71.629410]  ? ext4_writepages+0xc8/0x1b0
> > > > [   71.629411]  do_writepages+0xc4/0x180
> > > > [   71.629416]  __writeback_single_inode+0x45/0x350
> > > > [   71.629419]  ? _raw_spin_unlock+0xe/0x40
> > > > [   71.629423]  writeback_sb_inodes+0x260/0x5c0
> > > > [   71.629425]  ? __schedule+0x4d1/0x1870
> > > > [   71.629429]  __writeback_inodes_wb+0x54/0x100
> > > > [   71.629431]  ? queue_io+0x82/0x140
> > > > [   71.629433]  wb_writeback+0x1ab/0x330
> > > > [   71.629448]  wb_workfn+0x31d/0x410
> > > > [   71.629450]  process_one_work+0x191/0x3e0
> > > > [   71.629455]  worker_thread+0x2e3/0x420
> > > >
> > > > This issue can be easily reproduced by:
> > > > mkdir -p mnt
> > > > dd if=/dev/zero of=ext4disk bs=1G count=2 oflag=direct
> > > > mkfs.ext4 ext4disk
> > > > tune2fs -o journal_data ext4disk
> > > > mount ext4disk mnt
> > > > fio --name=fiotest --rw=randwrite --bs=4k --runtime=3 --ioengine=libaio --iodepth=128 --numjobs=4 --filename=mnt/fiotest --filesize=1G --group_reporting
> > > > mount -o remount,ro ext4disk mnt
> > > > sync
> > > >
> > > > In data=journal mode, metadata and data are both written to the journal
> > > > first, but for the second write, ext4 relies on the writeback thread to
> > > > flush the data to the real file location.
> > > >
> > > > After the filesystem is remounted to read only, writeback thread still
> > > > writes data to it and causes the issue. Return early to avoid starting
> > > > a journal transaction on a read only filesystem, once the filesystem
> > > > becomes writable again, the write thread will continue writing data.
> > > >
> > > > Signed-off-by: Gerald Yang <gerald.yang@canonical.com>
> > >
> > > Thanks for the report and the patch! I can indeed reproduce this warning.
> > > But the patch itself is certainly not the right fix for this problem.
> > > ext4_remount() must make sure there are no dirty pages on the filesystem
> > > anymore when remounting filesystem read only and it apparently fails to do
> > > so. In particular it calls sync_filesystem() which should make sure all
> > > data is written. So this bug needs more investigation why there are some
> > > dirty pages left in the inode in data=journal mode because
> > > ext4_writepages() should have written them all...
> > >
> > >                                                                 Honza
> > >
> > > > ---
> > > >  fs/ext4/inode.c | 11 +++++++++++
> > > >  1 file changed, 11 insertions(+)
> > > >
> > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > > index 15ba4d42982f..4e3bbf17995e 100644
> > > > --- a/fs/ext4/inode.c
> > > > +++ b/fs/ext4/inode.c
> > > > @@ -2787,6 +2787,17 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
> > > >       if (unlikely(ret))
> > > >               goto out_writepages;
> > > >
> > > > +     /*
> > > > +      * For data=journal, if the filesystem was remounted read-only,
> > > > +      * the writeback thread may still write dirty pages to it.
> > > > +      * Return early to avoid starting a journal transaction on a
> > > > +      * read-only filesystem.
> > > > +      */
> > > > +     if (ext4_should_journal_data(inode) && sb_rdonly(inode->i_sb)) {
> > > > +             ret = -EROFS;
> > > > +             goto out_writepages;
> > > > +     }
> > > > +
> > > >       /*
> > > >        * If we have inline data and arrive here, it means that
> > > >        * we will soon create the block for the 1st page, so
> > > > --
> > > > 2.43.0
> > > >
> > > --
> > > Jan Kara <jack@suse.com>
> > > SUSE Labs, CR
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode
  2026-01-30 11:38       ` Gerald Yang
@ 2026-02-03 14:50         ` Jan Kara
  2026-02-05  9:25           ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2026-02-03 14:50 UTC (permalink / raw)
  To: Gerald Yang; +Cc: Jan Kara, tytso, adilger.kernel, linux-ext4, gerald.yang.tw

Hello,

On Fri 30-01-26 19:38:55, Gerald Yang wrote:
> Thanks for sharing the findings, I'd also like to share some findings:
> I tried to figure out why the buffer is dirty after calling sync_filesystem,
> in mpage_prepare_extent_to_map, first I printed folio_test_dirty(folio):
> 
> while (index <= end)
>     ...
>     for (i = 0; i < nr_folios; i++) {
>         ...
>         (print if folio is dirty here)
> 
> and actually all folios are clean:
> if (!folio_test_dirty(folio) ||
>     ...
>     folio_unlock(folio);
>     continue;       <==== continue here without writing anything
> 
> Because the call trace happens before going into the above while loop:
> 
> if (ext4_should_journal_data(mpd->inode)) {
>     handle = ext4_journal_start(mpd->inode, EXT4_HT_WRITE_PAGE,
> 
> it checks if the file system is read only and dumps the call trace in
> ext4_journal_check_start, but it doesn't check if there are any real writes
> that will happen later in the loop.
> 
> To confirm this, first I added 2 more lines in the reproduce script before
> remounting read only:
> sync      <==== it calls ext4_sync_fs to flush all dirty data same as what's
>                          called during remount read only
> echo 1 > /proc/sys/vm/drop_caches       <==== drop clean page cache
> mount -o remount,ro ext4disk mnt
> 
> Then I can no longer reproduce the call trace.

OK, but ext4_do_writepages() has a check at the beginning:

        if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
                goto out_writepages;

So if there are no dirty pages, mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)
should be false and so we shouldn't go further?

It all looks like some kind of a race because I'm not always able to
reproduce the problem... I'll try to look more into this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode
  2026-02-03 14:50         ` Jan Kara
@ 2026-02-05  9:25           ` Jan Kara
  2026-02-05 12:59             ` Gerald Yang
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2026-02-05  9:25 UTC (permalink / raw)
  To: Gerald Yang; +Cc: Jan Kara, tytso, adilger.kernel, linux-ext4, gerald.yang.tw

On Tue 03-02-26 15:50:43, Jan Kara wrote:
> Hello,
> 
> On Fri 30-01-26 19:38:55, Gerald Yang wrote:
> > Thanks for sharing the findings, I'd also like to share some findings:
> > I tried to figure out why the buffer is dirty after calling sync_filesystem,
> > in mpage_prepare_extent_to_map, first I printed folio_test_dirty(folio):
> > 
> > while (index <= end)
> >     ...
> >     for (i = 0; i < nr_folios; i++) {
> >         ...
> >         (print if folio is dirty here)
> > 
> > and actually all folios are clean:
> > if (!folio_test_dirty(folio) ||
> >     ...
> >     folio_unlock(folio);
> >     continue;       <==== continue here without writing anything
> > 
> > Because the call trace happens before going into the above while loop:
> > 
> > if (ext4_should_journal_data(mpd->inode)) {
> >     handle = ext4_journal_start(mpd->inode, EXT4_HT_WRITE_PAGE,
> > 
> > it checks if the file system is read only and dumps the call trace in
> > ext4_journal_check_start, but it doesn't check if there are any real writes
> > that will happen later in the loop.
> > 
> > To confirm this, first I added 2 more lines in the reproduce script before
> > remounting read only:
> > sync      <==== it calls ext4_sync_fs to flush all dirty data same as what's
> >                          called during remount read only
> > echo 1 > /proc/sys/vm/drop_caches       <==== drop clean page cache
> > mount -o remount,ro ext4disk mnt
> > 
> > Then I can no longer reproduce the call trace.
> 
> OK, but ext4_do_writepages() has a check at the beginning:
> 
>         if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
>                 goto out_writepages;
> 
> So if there are no dirty pages, mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)
> should be false and so we shouldn't go further?
> 
> It all looks like some kind of a race because I'm not always able to
> reproduce the problem... I'll try to look more into this.

OK, the race is with checkpointing code writing the buffers while flush
worker tries to writeback the pages. I've posted a patch which fixes the
issue for me.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode
  2026-02-05  9:25           ` Jan Kara
@ 2026-02-05 12:59             ` Gerald Yang
  2026-03-26  1:50               ` Gerald Yang
  0 siblings, 1 reply; 12+ messages in thread
From: Gerald Yang @ 2026-02-05 12:59 UTC (permalink / raw)
  To: Jan Kara; +Cc: tytso, adilger.kernel, linux-ext4, gerald.yang.tw

Thanks Jan for fixing this issue, I can confirm the patch works for me too.


On Thu, Feb 5, 2026 at 5:25 PM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 03-02-26 15:50:43, Jan Kara wrote:
> > Hello,
> >
> > On Fri 30-01-26 19:38:55, Gerald Yang wrote:
> > > Thanks for sharing the findings, I'd also like to share some findings:
> > > I tried to figure out why the buffer is dirty after calling sync_filesystem,
> > > in mpage_prepare_extent_to_map, first I printed folio_test_dirty(folio):
> > >
> > > while (index <= end)
> > >     ...
> > >     for (i = 0; i < nr_folios; i++) {
> > >         ...
> > >         (print if folio is dirty here)
> > >
> > > and actually all folios are clean:
> > > if (!folio_test_dirty(folio) ||
> > >     ...
> > >     folio_unlock(folio);
> > >     continue;       <==== continue here without writing anything
> > >
> > > Because the call trace happens before going into the above while loop:
> > >
> > > if (ext4_should_journal_data(mpd->inode)) {
> > >     handle = ext4_journal_start(mpd->inode, EXT4_HT_WRITE_PAGE,
> > >
> > > it checks if the file system is read only and dumps the call trace in
> > > ext4_journal_check_start, but it doesn't check if there are any real writes
> > > that will happen later in the loop.
> > >
> > > To confirm this, first I added 2 more lines in the reproduce script before
> > > remounting read only:
> > > sync      <==== it calls ext4_sync_fs to flush all dirty data same as what's
> > >                          called during remount read only
> > > echo 1 > /proc/sys/vm/drop_caches       <==== drop clean page cache
> > > mount -o remount,ro ext4disk mnt
> > >
> > > Then I can no longer reproduce the call trace.
> >
> > OK, but ext4_do_writepages() has a check at the beginning:
> >
> >         if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
> >                 goto out_writepages;
> >
> > So if there are no dirty pages, mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)
> > should be false and so we shouldn't go further?
> >
> > It all looks like some kind of a race because I'm not always able to
> > reproduce the problem... I'll try to look more into this.
>
> OK, the race is with checkpointing code writing the buffers while flush
> worker tries to writeback the pages. I've posted a patch which fixes the
> issue for me.
>
>                                                                 Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode
  2026-02-05 12:59             ` Gerald Yang
@ 2026-03-26  1:50               ` Gerald Yang
  2026-03-26  8:54                 ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Gerald Yang @ 2026-03-26  1:50 UTC (permalink / raw)
  To: Jan Kara; +Cc: tytso, adilger.kernel, linux-ext4, gerald.yang.tw

Hi Jan,

I'd like to ask when this will land in upstream kernel? Thanks.

On Thu, Feb 5, 2026 at 8:59 PM Gerald Yang <gerald.yang@canonical.com> wrote:
>
> Thanks Jan for fixing this issue, I can confirm the patch works for me too.
>
>
> On Thu, Feb 5, 2026 at 5:25 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Tue 03-02-26 15:50:43, Jan Kara wrote:
> > > Hello,
> > >
> > > On Fri 30-01-26 19:38:55, Gerald Yang wrote:
> > > > Thanks for sharing the findings, I'd also like to share some findings:
> > > > I tried to figure out why the buffer is dirty after calling sync_filesystem,
> > > > in mpage_prepare_extent_to_map, first I printed folio_test_dirty(folio):
> > > >
> > > > while (index <= end)
> > > >     ...
> > > >     for (i = 0; i < nr_folios; i++) {
> > > >         ...
> > > >         (print if folio is dirty here)
> > > >
> > > > and actually all folios are clean:
> > > > if (!folio_test_dirty(folio) ||
> > > >     ...
> > > >     folio_unlock(folio);
> > > >     continue;       <==== continue here without writing anything
> > > >
> > > > Because the call trace happens before going into the above while loop:
> > > >
> > > > if (ext4_should_journal_data(mpd->inode)) {
> > > >     handle = ext4_journal_start(mpd->inode, EXT4_HT_WRITE_PAGE,
> > > >
> > > > it checks if the file system is read only and dumps the call trace in
> > > > ext4_journal_check_start, but it doesn't check if there are any real writes
> > > > that will happen later in the loop.
> > > >
> > > > To confirm this, first I added 2 more lines in the reproduce script before
> > > > remounting read only:
> > > > sync      <==== it calls ext4_sync_fs to flush all dirty data same as what's
> > > >                          called during remount read only
> > > > echo 1 > /proc/sys/vm/drop_caches       <==== drop clean page cache
> > > > mount -o remount,ro ext4disk mnt
> > > >
> > > > Then I can no longer reproduce the call trace.
> > >
> > > OK, but ext4_do_writepages() has a check at the beginning:
> > >
> > >         if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
> > >                 goto out_writepages;
> > >
> > > So if there are no dirty pages, mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)
> > > should be false and so we shouldn't go further?
> > >
> > > It all looks like some kind of a race because I'm not always able to
> > > reproduce the problem... I'll try to look more into this.
> >
> > OK, the race is with checkpointing code writing the buffers while flush
> > worker tries to writeback the pages. I've posted a patch which fixes the
> > issue for me.
> >
> >                                                                 Honza
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode
  2026-03-26  1:50               ` Gerald Yang
@ 2026-03-26  8:54                 ` Jan Kara
  2026-03-27  7:26                   ` Gerald Yang
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2026-03-26  8:54 UTC (permalink / raw)
  To: Gerald Yang; +Cc: Jan Kara, tytso, adilger.kernel, linux-ext4, gerald.yang.tw

Hi!

On Thu 26-03-26 09:50:39, Gerald Yang wrote:
> I'd like to ask when this will land in upstream kernel? Thanks.

Well, this is more a question to Ted. He didn't pick up the patch [1] to
his tree yet. I guess he'll pick it up for the next merge window which will
start in two weeks or so.

								Honza

[1] https://lore.kernel.org/linux-ext4/20260205092223.21287-2-jack@suse.cz/

> 
> On Thu, Feb 5, 2026 at 8:59 PM Gerald Yang <gerald.yang@canonical.com> wrote:
> >
> > Thanks Jan for fixing this issue, I can confirm the patch works for me too.
> >
> >
> > On Thu, Feb 5, 2026 at 5:25 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Tue 03-02-26 15:50:43, Jan Kara wrote:
> > > > Hello,
> > > >
> > > > On Fri 30-01-26 19:38:55, Gerald Yang wrote:
> > > > > Thanks for sharing the findings, I'd also like to share some findings:
> > > > > I tried to figure out why the buffer is dirty after calling sync_filesystem,
> > > > > in mpage_prepare_extent_to_map, first I printed folio_test_dirty(folio):
> > > > >
> > > > > while (index <= end)
> > > > >     ...
> > > > >     for (i = 0; i < nr_folios; i++) {
> > > > >         ...
> > > > >         (print if folio is dirty here)
> > > > >
> > > > > and actually all folios are clean:
> > > > > if (!folio_test_dirty(folio) ||
> > > > >     ...
> > > > >     folio_unlock(folio);
> > > > >     continue;       <==== continue here without writing anything
> > > > >
> > > > > Because the call trace happens before going into the above while loop:
> > > > >
> > > > > if (ext4_should_journal_data(mpd->inode)) {
> > > > >     handle = ext4_journal_start(mpd->inode, EXT4_HT_WRITE_PAGE,
> > > > >
> > > > > it checks if the file system is read only and dumps the call trace in
> > > > > ext4_journal_check_start, but it doesn't check if there are any real writes
> > > > > that will happen later in the loop.
> > > > >
> > > > > To confirm this, first I added 2 more lines in the reproduce script before
> > > > > remounting read only:
> > > > > sync      <==== it calls ext4_sync_fs to flush all dirty data same as what's
> > > > >                          called during remount read only
> > > > > echo 1 > /proc/sys/vm/drop_caches       <==== drop clean page cache
> > > > > mount -o remount,ro ext4disk mnt
> > > > >
> > > > > Then I can no longer reproduce the call trace.
> > > >
> > > > OK, but ext4_do_writepages() has a check at the beginning:
> > > >
> > > >         if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
> > > >                 goto out_writepages;
> > > >
> > > > So if there are no dirty pages, mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)
> > > > should be false and so we shouldn't go further?
> > > >
> > > > It all looks like some kind of a race because I'm not always able to
> > > > reproduce the problem... I'll try to look more into this.
> > >
> > > OK, the race is with checkpointing code writing the buffers while flush
> > > worker tries to writeback the pages. I've posted a patch which fixes the
> > > issue for me.
> > >
> > >                                                                 Honza
> > > --
> > > Jan Kara <jack@suse.com>
> > > SUSE Labs, CR
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode
  2026-03-26  8:54                 ` Jan Kara
@ 2026-03-27  7:26                   ` Gerald Yang
  2026-03-27 15:42                     ` Theodore Tso
  0 siblings, 1 reply; 12+ messages in thread
From: Gerald Yang @ 2026-03-27  7:26 UTC (permalink / raw)
  To: Jan Kara; +Cc: tytso, adilger.kernel, linux-ext4, gerald.yang.tw

Thanks Jan!

Hi Ted,

Would this be merged into the next merge window?
https://lore.kernel.org/linux-ext4/20260205092223.21287-2-jack@suse.cz/

Thanks,
Gerald

On Thu, Mar 26, 2026 at 4:54 PM Jan Kara <jack@suse.cz> wrote:
>
> Hi!
>
> On Thu 26-03-26 09:50:39, Gerald Yang wrote:
> > I'd like to ask when this will land in upstream kernel? Thanks.
>
> Well, this is more a question to Ted. He didn't pick up the patch [1] to
> his tree yet. I guess he'll pick it up for the next merge window which will
> start in two weeks or so.
>
>                                                                 Honza
>
> [1] https://lore.kernel.org/linux-ext4/20260205092223.21287-2-jack@suse.cz/
>
> >
> > On Thu, Feb 5, 2026 at 8:59 PM Gerald Yang <gerald.yang@canonical.com> wrote:
> > >
> > > Thanks Jan for fixing this issue, I can confirm the patch works for me too.
> > >
> > >
> > > On Thu, Feb 5, 2026 at 5:25 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Tue 03-02-26 15:50:43, Jan Kara wrote:
> > > > > Hello,
> > > > >
> > > > > On Fri 30-01-26 19:38:55, Gerald Yang wrote:
> > > > > > Thanks for sharing the findings, I'd also like to share some findings:
> > > > > > I tried to figure out why the buffer is dirty after calling sync_filesystem,
> > > > > > in mpage_prepare_extent_to_map, first I printed folio_test_dirty(folio):
> > > > > >
> > > > > > while (index <= end)
> > > > > >     ...
> > > > > >     for (i = 0; i < nr_folios; i++) {
> > > > > >         ...
> > > > > >         (print if folio is dirty here)
> > > > > >
> > > > > > and actually all folios are clean:
> > > > > > if (!folio_test_dirty(folio) ||
> > > > > >     ...
> > > > > >     folio_unlock(folio);
> > > > > >     continue;       <==== continue here without writing anything
> > > > > >
> > > > > > Because the call trace happens before going into the above while loop:
> > > > > >
> > > > > > if (ext4_should_journal_data(mpd->inode)) {
> > > > > >     handle = ext4_journal_start(mpd->inode, EXT4_HT_WRITE_PAGE,
> > > > > >
> > > > > > it checks if the file system is read only and dumps the call trace in
> > > > > > ext4_journal_check_start, but it doesn't check if there are any real writes
> > > > > > that will happen later in the loop.
> > > > > >
> > > > > > To confirm this, first I added 2 more lines in the reproduce script before
> > > > > > remounting read only:
> > > > > > sync      <==== it calls ext4_sync_fs to flush all dirty data same as what's
> > > > > >                          called during remount read only
> > > > > > echo 1 > /proc/sys/vm/drop_caches       <==== drop clean page cache
> > > > > > mount -o remount,ro ext4disk mnt
> > > > > >
> > > > > > Then I can no longer reproduce the call trace.
> > > > >
> > > > > OK, but ext4_do_writepages() has a check at the beginning:
> > > > >
> > > > >         if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
> > > > >                 goto out_writepages;
> > > > >
> > > > > So if there are no dirty pages, mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)
> > > > > should be false and so we shouldn't go further?
> > > > >
> > > > > It all looks like some kind of a race because I'm not always able to
> > > > > reproduce the problem... I'll try to look more into this.
> > > >
> > > > OK, the race is with checkpointing code writing the buffers while flush
> > > > worker tries to writeback the pages. I've posted a patch which fixes the
> > > > issue for me.
> > > >
> > > >                                                                 Honza
> > > > --
> > > > Jan Kara <jack@suse.com>
> > > > SUSE Labs, CR
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode
  2026-03-27  7:26                   ` Gerald Yang
@ 2026-03-27 15:42                     ` Theodore Tso
  0 siblings, 0 replies; 12+ messages in thread
From: Theodore Tso @ 2026-03-27 15:42 UTC (permalink / raw)
  To: Gerald Yang; +Cc: Jan Kara, adilger.kernel, linux-ext4, gerald.yang.tw

On Fri, Mar 27, 2026 at 03:26:38PM +0800, Gerald Yang wrote:
> 
> Would this be merged into the next merge window?
> https://lore.kernel.org/linux-ext4/20260205092223.21287-2-jack@suse.cz/

Yes, it's in the ext4 git tree already and will be sent along with a
number of other bugfixes to Linus before 7.0 is released.

       	  		    	  	     - Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-03-27 15:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-28  7:45 [PATCH] ext4: Fix call trace when remounting to read only in data=journal mode Gerald Yang
2026-01-28 10:22 ` Jan Kara
2026-01-29  3:31   ` Gerald Yang
2026-01-29  9:31     ` Jan Kara
2026-01-30 11:38       ` Gerald Yang
2026-02-03 14:50         ` Jan Kara
2026-02-05  9:25           ` Jan Kara
2026-02-05 12:59             ` Gerald Yang
2026-03-26  1:50               ` Gerald Yang
2026-03-26  8:54                 ` Jan Kara
2026-03-27  7:26                   ` Gerald Yang
2026-03-27 15:42                     ` Theodore Tso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox